Baidu's PaddlePaddle Team releases PaddleDecc-VL (0.9B): Navit-style 4.5-0.3b vlm aimed at high end

How do you convert complex, multilingual structures, small documents, formulas, charts, and handwriting – online with systematic accuracy while maintaining artistic accuracy while maintaining memory accuracy and memory low enough for actual presentation?Baidu's PaddlePaddle Group take out PaddleFOCCC-VLa 0.9B-parameter An original language model designed for an end-to-end document in order to break down text, tables, formulas, charts, and handwriting. A critical model includes a Navit style (Vistivation-Resolution Vit) Dynamic-Resolution Encoder Encoder with Ernie-4.5-0.3B decoder. It supports 109 languages.

Understanding the structure of the program
PaddleFOcc-VL is delivered as A two-stage pipeline. First stage (PP-Doclayv2) Perform page-level analysis: an Rt-detr The detector constructs and separates regions; a Pointer Network Prediction Reading order. Stage Two (PaddleDCCCH-VL-0.9B) Performs Element-Level recognition on the resulting structure. The final output is compiled Marking and You're going with low consumption. This burst reduces the sequential order of decoration latency and the overall instability of vlms on dense, multi-column, interlaced pages of text.
At the model level, PaddleDCCCH-VL-0.9B includes a Navit style High-resolution Encoder High-resoder (traditional stacking) with 2-layer mlp projector once Ernie-4.5-0.3B language model; 3D thread is used for periodic representation. A technical report on lower costs and better text performance for better text performance in native applications related to modified re-amplification or cleaning methods. This page Signing Idea-and-Pack conversion installation that is flexible without destructive re-amplification – from previous work that shows improved performance and durability; The PaddleDOcc-VL accepts this style of encoder directly.
Benches
Paddledlecc-VL reach state of the art results in Omnidocbench v1.5 and competitive or leading scores v1.0covering the full quality and sub-tasks (grades of text editing, Formula-cdm, Table-teds / teds-sand order-to-order), with corresponding powers in The Olmocc-Bench and handwriting, table, formula, and chart testing.


Key taken
- 0.9B-Parameter Paddledle Dredlecc-VL includes dynamic Encoder-Resolution-Resolution Encoder for navie-4.5-0.3b for document annotation.
- It is intended for complete end-to-end output of text, tables, formulas, charts, and handwriting with markdown / json output.
- FUSA INCLUDES SOTA CERTIFICATE on public document benches for quick pickup ready for shipping.
- It supports 109 languages, including small scripts and complex page layouts.
This release is objective because it joins the navit-sty-resolution visual encoder with a lightweight decoder of 4,5-0.3b to move the Sota page document – recognition of the level of the event in the active discussions of measurement. The two-stage PP-COCCOUTV-→ vl-0.9B design is accurate to read and store traditional text, which is important for small texts, formulas, charts, and handwriting across 109 languages. Markdown / JSON output and VLLM / Sglm / Sglang acceleration make the system clean with Production Presedom Intelligence functionality.
Look Technical paper, detailed HF model, and technical specifications . Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.
AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of a media intelligence platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.
Follow Marktechpost: Add us as a favorite source on Google.



