Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX

Liquid AI shipped LFM2.5-230M, it’s the company’s smallest model to date. The release targets a specific job: running agentic tasks on phones, robots, and automation devices. Both the base and instruction-tuned checkpoints are open-weight on Hugging Face.

The pitch is narrow on purpose. This is not a general reasoning model. It is built for data extraction and tool use on edge hardware.

TL;DR

Liquid AI’s LFM2.5-230M is its smallest model yet: 230M params, open-weight, built on LFM2.
Runs on-device at 213 tok/s on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5.
Beats larger models (Qwen3.5-0.8B, Gemma 3 1B) on instruction following and data extraction.
Tuned for tool use and extraction; not for math, code generation, or creative writing.
Day-one support across llama.cpp, MLX, vLLM, SGLang, and ONNX, with a 293–375 MB footprint.

What is LFM2.5-230M?

LFM2.5-230M is a 230-million-parameter, text-only model. It is built on the LFM2 architecture. The model has 14 layers total. Eight are double-gated LIV convolution blocks. The remaining six are grouped-query attention (GQA) blocks. The hybrid layout targets fast CPU inference.

The context length is 32,768 tokens. The vocabulary size is 65,536. The knowledge cutoff is mid-2024. It supports ten languages, including English, Chinese, Arabic, and Japanese.

Liquid AI team ships two checkpoints. LFM2.5-230M-Base is the pre-trained model for fine-tuning. LFM2.5-230M is the general-purpose instruction-tuned version. The license is lfm1.0.

Training and Post-Training

The model was pre-trained on 19 trillion tokens. That total includes a 32K context extension phase. The post-training recipe then runs in three stages.

First comes supervised fine-tuning with distillation from the larger LFM2.5-350M. Second is direct preference optimization (DPO). Third is multi-domain reinforcement learning. This preserves flexibility for downstream specialization.

The distillation step is what keeps a 230M model competitive with larger checkpoints. It inherits behavior from the bigger LFM2.5-350M on targeted tasks.

Benchmark

Liquid AI team evaluated LFM2.5-230M across ten benchmarks. They span knowledge, instruction following, data extraction, and tool use.

The instruction-following results support that. On IFEval, LFM2.5-230M scores 71.71. That beats Qwen3.5-0.8B (59.94) and Gemma 3 1B IT (63.49). On IFBench it scores 38.40, ahead of both. On CaseReportBench, a clinical data-extraction test, it scores 22.51.

Model	Params	IFEval	IFBench	CaseReportBench	BFCLv4	MMLU-Pro
LFM2.5-230M	230M	71.71	38.40	22.51	21.03	20.25
LFM2.5-350M	350M	76.96	40.69	32.45	21.86	20.01
Granite 4.0-H-350M	350M	61.27	17.22	12.44	13.28	13.14
Qwen3.5-0.8B (Instruct)	800M	59.94	22.87	13.83	18.70	37.42
Gemma 3 1B IT	1B	63.49	20.33	2.28	7.17	14.04

LFM2.5-230M leads on instruction following and data extraction. It trails on broad knowledge: MMLU-Pro is 20.25, behind Qwen3.5-0.8B’s 37.42. It is also weak on some agentic tool use. On τ²-Bench Telecom it scores just 5.26.

Liquid AI is direct about the limits. It does not recommend the model for reasoning-heavy workloads. That means advanced math, code generation, and creative writing.

Use Cases With Examples

The model fits two jobs well.

The first is large-scale data extraction pipelines. Picture a pipeline parsing 100,000 clinical reports into structured fields. A 4-bit build with a 293–375 MB memory footprint runs that on commodity CPUs. You extract locally, with no per-token API bill.

The second job is lightweight on-device agentic workloads. Think a home automation hub that turns speech into tool calls. Or a phone assistant that routes a request to the right function.

As an early signal, Liquid AI deployed the model on a Unitree G1 humanoid robot. It ran entirely on the robot’s onboard NVIDIA Jetson Orin. There the model acted as a skill-selection layer. It turned one natural-language instruction into a sequence of tool calls. Those calls invoked low-level skills from NVIDIA’s SONIC framework.

LFM2.5 supports function calling in four steps. You define tools as JSON in the system prompt. The model writes a Pythonic function call between special tokens. You execute the call and return the result. The model then writes a plain-text answer.

By default the call is a Python list. It sits between the <|tool_call_start|> and <|tool_call_end|> tokens. Here is the documented pattern, with the tool JSON abbreviated:

<|im_start|>system
List of tools: [{"name": "get_candidate_status",
  "parameters": {"candidate_id": {"type": "string"}}}]<|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>

You can also force JSON-formatted calls through the system prompt.

Running It: A Minimal Example

The model works with Transformers 5.0.0 and up. The recommended generation settings are temperature 0.1, top_k 50, and repetition_penalty 1.05. Note the do_sample=True flag, which is required for those sampling settings to apply.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LiquidAI/LFM2.5-230M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is C. elegans?"}],
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

output = model.generate(
    **inputs,
    do_sample=True,
    temperature=0.1,
    top_k=50,
    repetition_penalty=1.05,
    max_new_tokens=512,
)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Liquid AI also publishes fine-tuning recipes. They cover SFT, DPO, and GRPO with LoRA, via Unsloth and TRL. Each ships as a Colab notebook.

Interactive Explainer

‘+m.n+’ ‘+m.p+’‘+ ”+ ‘‘+m.d[idx].toFixed(2)+’

Source link

CEO & Founder

Moiz Ahmad

Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference

TL;DR

What is LFM2.5-230M?

Training and Post-Training

Benchmark

Use Cases With Examples

Running It: A Minimal Example

Interactive Explainer

Leave a Reply Cancel reply

Trending News

National

Sports

AI & Tech

CEO & Founder

TL;DR

What is LFM2.5-230M?

Training and Post-Training

Benchmark

Use Cases With Examples

Running It: A Minimal Example

Interactive Explainer

Leave a Reply Cancel reply

Related News

Popular News

Trending News

Recent News