IBM AI Releases Granite-Docling-258M: An Open-Supply, Enterprise-Prepared Doc AI Mannequin

IBM has launched Granite-Docling-258M, an open-source (Apache-2.0) vision-language mannequin designed particularly for end-to-end doc conversion. The mannequin targets layout-faithful extraction—tables, code, equations, lists, captions, and studying order—emitting a structured, machine-readable illustration quite than lossy Markdown. It’s out there on Hugging Face with a dwell demo and MLX construct for Apple Silicon.

What’s new in comparison with SmolDocling?

Granite-Docling is the product-ready successor to SmolDocling-256M. IBM changed the sooner spine with a Granite 165M language mannequin and upgraded the imaginative and prescient encoder to SigLIP2 (base, patch16-512) whereas retaining the Idefics3-style connector (pixel-shuffle projector). The ensuing mannequin has 258M parameters and exhibits constant accuracy good points throughout structure evaluation, full-page OCR, code, equations, and tables (see metrics beneath). IBM additionally addressed instability failure modes noticed within the preview mannequin (e.g., repetitive token loops).

Structure and coaching pipeline

Spine: Idefics3-derived stack with SigLIP2 imaginative and prescient encoder → pixel-shuffle connector → Granite 165M LLM.
Coaching framework: nanoVLM (light-weight, pure-PyTorch VLM coaching toolkit).
Illustration: Outputs DocTags, an IBM-authored markup designed for unambiguous doc construction (parts + coordinates + relationships), which downstream instruments convert to Markdown/HTML/JSON.
Compute: Skilled on IBM’s Blue Vela H100 cluster.

Quantified enhancements (Granite-Docling-258M vs. SmolDocling-256M preview)

Evaluated with docling-eval, LMMS-Eval, and task-specific datasets:

Format: MAP 0.27 vs. 0.23; F1 0.86 vs. 0.85.
Full-page OCR: F1 0.84 vs. 0.80; decrease edit distance.
Code recognition: F1 0.988 vs. 0.915; edit distance 0.013 vs. 0.114.
Equation recognition: F1 0.968 vs. 0.947.
Desk recognition (FinTabNet @150dpi): TEDS-structure 0.97 vs. 0.82; TEDS with content material 0.96 vs. 0.76.
Different benchmarks: MMStar 0.30 vs. 0.17; OCRBench 500 vs. 338.
Stability: “Avoids infinite loops extra successfully” (production-oriented repair).

Multilingual help

Granite-Docling provides experimental help for Japanese, Arabic, and Chinese language. IBM marks this as early-stage; English stays the first goal.

How the DocTags pathway modifications Doc AI

Standard OCR-to-Markdown pipelines lose structural data and complicate downstream retrieval-augmented era (RAG). Granite-Docling emits DocTags—a compact, LLM-friendly structural grammar—which Docling converts into Markdown/HTML/JSON. This preserves desk topology, inline/floating math, code blocks, captions, and studying order with express coordinates, enhancing index high quality and grounding for RAG and analytics.

Inference and integration

Docling Integration (advisable): The docling CLI/SDK routinely pulls Granite-Docling and converts PDFs/workplace docs/pictures to a number of codecs. IBM positions the mannequin as a element inside Docling pipelines quite than a common VLM.
Runtimes: Works with Transformers, vLLM, ONNX, and MLX; a devoted MLX construct is optimized for Apple Silicon. A Hugging Face House offers an interactive demo (ZeroGPU).
License: Apache-2.0.

Why Granite-Docling?

For enterprise doc AI, small VLMs that protect construction cut back inference price and pipeline complexity. Granite-Docling replaces a number of single-purpose fashions (structure, OCR, desk, code, equations) with a single element that emits a richer intermediate illustration, enhancing downstream retrieval and conversion constancy. The measured good points—in TEDS for tables, F1 for code/equations, and diminished instability—make it a sensible improve from SmolDocling for manufacturing workflows.

Demo

Abstract

Granite-Docling-258M marks a major development in compact, structure-preserving doc AI. By combining IBM’s Granite spine, SigLIP2 imaginative and prescient encoder, and the nanoVLM coaching framework, it delivers enterprise-ready efficiency throughout tables, equations, code, and multilingual textual content—all whereas remaining light-weight and open-source below Apache 2.0. With measurable good points over its SmolDocling predecessor and seamless integration into Docling pipelines, Granite-Docling offers a sensible basis for doc conversion and RAG workflows the place precision and reliability are essential.

Take a look at the Models on Hugging Face and Demo here. Be happy to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

🔥[Recommended Read] NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Highly effective and Versatile 3D Video Annotation Device for Spatial AI

Source link

What's Hot

Trump says US looking for management of Bagram air base given up in Afghanistan withdrawal – World

Thuram heading in the right direction as Inter Milan cruise at Ajax

Northern, Southern Ontario to be linked with new commerce hall

China Turns Legacy Chips Right into a Commerce Weapon

Thomas Wolf of Hugging Face on the Way forward for Open AI at Disrupt 2025

Atlassian acquires DX, a developer productiveness platform, for $1B

Women cricketers send unity and hope on August 14

Particular Training Division Punjab Jobs 2025 Present Openings

Lawyer ‘very assured’ a overseas adversary attacked Canadian diplomats in Cuba – Nationwide

Most Popular