Generative AI

Mistral AI Releases OCR 3: A Minimal Human Recognition (OCR) Model for AI-Edited Document at Scale

Mistral AI has released Mistral OCR 3, its latest character recognition service that powers the company's Document AI stack. The model, called mistral-ocr-2512it's designed to remove residual text and images from PDFs and other documents while preserving formatting, and it does this at an aggressive price of $2 per 1,000 pages with a 50% discount when used with the Batch API.

What is Mistral OCR 3 configured for?

Mistral OCR 3 addresses the general workload of business documents. The model is optimized for forms, scanned documents, complex tables, and handwriting. It is tested by internal benchmarks taken from real business use cases, where it achieves a 74% overall win rate for Mistral OCR 2 across these document categories using a fuzzy matching metric against ground truth.

The results layout model preserves the document structure, and when table formatting is enabled, enriches the output with HTML-based table representations. This combination provides downstream systems with both the content and structural information needed to retrieve pipelines, statistics, and agent workflows.

Contribution to Mistral Document AI

OCR 3 sits within Mistral Document AI, the company's document processing capabilities that include OCR and structured data extraction and Document QnA.

Now enabling the Document AI Playground in Mistral AI Studio. In this link, users upload PDFs or images and receive plain text or structured JSON without coding. The same basic OCR pipeline is accessible through a public API, allowing teams to move from interactive testing to production workloads without changing the core model.

Input, Output, and Structure

The OCR processor accepts multiple document formats through a single API. I document field can point to:

  • document_url for PDF, pptx, docx and more
  • image_url with image types such as png, jpeg or avif
  • Upload either base64 encoded PDFs or images with the same schema

This is documented in the OCR Processor section of Mistral's Document AI documentation.

The response is a JSON object with a pages list. Each page contains an index, a tag string, a list of images, a list of tables when table_format="html" used, found links, optional header again footer fields where header or footer output is enabled, and a dimensions page size object. There is also a document_annotation systematic annotation field and a usage_info block with accounting information.

When images and HTML tables are extracted, comments include placeholders such as ![img-0.jpeg](img-0.jpeg) again [tbl-3.html](tbl-3.html). These representations are mapped back to the original content using i images again tables arrays in response, making it easier to reconstruct the river.

Mistral OCR 2 Development

Mistral OCR 3 introduces several strong improvements relative to OCR 2. The public release notes emphasize four key areas.

  • Hand writing Mistral OCR 3 more accurately interprets complex, mixed content, and handwritten text overlays on printed templates.
  • Forms Improves the detection of boxes, labels, and handwritten entries in dense structures such as invoices, receipts, compliance forms, and government documents.
  • Scanned and complex documents The model is very robust to the suppression of artifacts, skew, distortion, low DPI, and background noise in scanned pages.
  • Complex tables Rebuild table layouts with headers, merged cells, multi-row blocks, and column ranges, and can restore HTML tables correctly. colspan again rowspan tags to maintain structure.

Pricing, Batch Inference, and Annotations

The OCR 3 model card lists the price at $2 per 1,000 pages for standard OCR and $3 per 1,000 annotated pages when using built-in annotations.

Mistral also exposes OCR 3 through its Batch Inference API /v1/batchwhich is written under the integration part of the platform. Batch processing reduces the cost of running OCR to $1 per 1,000 pages by using a 50% discount for jobs that run between batches.

The model also includes two important features in the same repository, Annotations – Structured and BBox Extraction. This allows developers to attach schema-driven labels to document regions and find bounding boxes for text and other elements, which is useful when mapping content to downstream systems or the UI overlay.

Key Takeaways

  1. Model and role: Mistral OCR 3, named mistral-ocr-2512a new OCR service that powers Mistral's Document AI stack for page-based document understanding.
  2. Accuracy benefits: In internal benchmarks covering forms, scanned documents, complex tables, and handwriting, OCR 3 achieves an overall win rate of 74% over Mistral OCR 2, and Mistral sets it as the state of the art against traditional OCR and AI systems.
  3. Scheduled results for RAG: The service extracts text with loops and embedded images and returns a layout enriched with reconstructed HTML tables, preserving structure and table structure so that the output can be fed into RAG, agents, and search pipelines with little additional analysis.
  4. API and document formats: Developers access OCR 3 with /v1/ocr endpoint or SDK, transfers PDFs as document_url and images such as png or jpeg as image_urland can enable options such as HTML table output, header or footer output, and base64 images in the response.
  5. Pricing and bulk processing: OCR 3 has a price of 2 dollars per 1,000 page and 3 dollars per 1,000 page of annotations, and when used with Batch API the effective price of standard OCR drops to 1 dollar per 1,000 page for large processing.

Check out TECHNICAL DETAILS. Feel free to check out our GitHub page for Tutorials, Codes and Notebooks. Also, feel free to follow us Twitter and don't forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper.


Michal Sutter is a data science expert with a Master of Science in Data Science from the University of Padova. With a strong foundation in statistical analysis, machine learning, and data engineering, Michal excels at turning complex data sets into actionable insights.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button