Machine Learning

An immortalized snow-free document Ai

nimda April 15, 2025

0 10 6 minutes read

As data, free with table data …

Tabar data. Photo by author.

We can also treat words, Json, XML FUDS, and cats. But what about a boxboard box full of things like this?

The information in this receipt is sensitive to the tabase database in a particular area. Wouldn't it be good if we could scare all this, run them with a llm, and keep results on the table?

We are lucky to us, we are living in a time of document Ai. Document AI includes OCR by llMs and allows us to build the bridge between World paper and the Digital Database World.

All Cloud Cloud merchants have some version of this …

Here I will share my thoughts in a snow snow document Ai. Without using ice in the snow work, I have no problem with snow. They did not send me to write this CEAzi and I am not part of any embassy system. That is all to say I can write an exterior Review of snow-snowboarding AI.

What is a document Ai?

Document AI allows users to immediately exclude information from digital text. When we say “Docs” means pictures with words. Don't confuse this with the niche and SQL items.

The product includes OCR and LLM models so that the user can create a set of controversy and use those endurance against a large group of texts at the same time.

Snow Snow Document AI on a (Scubbed) Get started. Photo by author.

Llms and the OCR both have an error. Snowflake Solved this (1) to their heads against OCR up to an Ocr down – I can see it, the founder of Snowflake – and (2) which allows me to suit my llm.

Good light Snowflake LLM sounds so much like glompteng than the other outdoor adventure. I review 20 documents, hit the “Railist” button, and search and repeat until the performance is satisfying. Am I still in the scientist of data again?

Once the model is trained, I can run my products in 1000 documents at a time. I like to save the results on the table but you can do anything you want with real-time effects.

Why is it important?

This product is cool for several reasons.

You can create bridge between the digital paper and earth. I never thought that the big box invoice papers below my desk can enter the Cloud Data Weastause, but now it is possible. Scan the paper invoice, uploaded to snow snow, run my AI, wham document model! I have my desired knowledge compounded on the quartered table.
It is very convenient to urge the machine to study the machine by SQL. Why don't we think about this soon? This time this was a few hundred lines of loading green data (SQL >>> Python / Spark / etc.

Building this house inside will be a great obligation. Yes, the OCR has been long time but it can still be inferior. Good planning for the llm apparently never too long, but it becomes easier a week. Dealing with this together in a way that achieves high accuracy of different texts can take a long time to scrap your own. Months of Polish months.

Of course some things have been built in the house. When I released information from the document you must find out what to do with that information. That is the fastest job, however.

Our case of use – Bring a period of flu:

I work in a company called Intlycare. We work in a health care center, which means that we will help hospitals, older homes, and renewal centers receive high-quality doctors, additional contractors, or full-time.

Our many institutions require that doctors buildings fires quickly. Last year, our doctors have moved more than 10,000 flu shots above hundreds of thousands of texts. We also updated all this manually to make sure it is appropriate. A part of the joys of working in the health care worker!

Sphangeer Notice: Used Document AI, we were able to reduce the number of shot documents needed hand-hand reviews by ~ 50% and everything in just a few weeks.

Removing this, do the following:

A large number of snowfall documents are added.

They promise to motivate, train the model, reduce several times, and restore some model …
Created Logic to compare the model out against the Clinicist profile (eg words like those words?). The trial and error and error here in format words, days, etc.
Created a “attitude” to allow the document or send them back to people.
A full pipe tested with a large number of texts updated. Looked closely any amazing things.
It is repeated until our confusion matrix satisfactored.

In this project, the false positive risked. We do not want to allow the outdated or missing document. We continue to keep until the false rate has zero. We will have other false effects in the end, but it is a few we have now with the process of human review.

Lies, however, are not dangerous. If our pipe does not like to be shot by the flu, it simply distributes a document to the person's group to review. If they continue to admit the document, the entity as usual.

The model performs well the clean / simple documents, written by ~ 50% of all flu shots. If it is confused or confused, it goes back to people as before.

The things we read there

The model is very much better for reading a document, not making decisions or making statistics according to the document.

Initially, our Refts try to determine the authentication authentication.

Bad: Does the document already expire?

We've got you distant Excessive successfully to reduce our power from the questions that are not answered by checking the document. Llm doesn't persist anything else. It is simply hosting the right data pointing to the page.

Good: What is the expiry date?

Keep results and make statistics down.

You still need consideration about training data

We had double shooting with one clinic in our training detail. Call this clinic Ben. One of our arguments, “what is the patient's name?” Because “Ben” was in the form of training information many times, whichever mysterious scroll can return with the “Ben” as a patient's name.

So extremes is still. Above / under the sample is still. We also tried in a thoughtful collection of training writings and things made great improvement.

Document AI good magic, but not that magical. The basic is still important.

The model can be deceived in writing to napkin.

As far as I know, Snowflake doesn't have a way to give a picture of the document as a promise. You can create embarking on the written text, but that will not tell you if the text is handwritten or not. As long as writing Valid, model and downloadstream logic will give green light.

You can correct this easily by comparing the announcement of the image documents submitted to the dedication of the received documents. Any emergency roll is the way out of the left field is sent to one's review. This is a direct job, but you will have to do without snow now.

It is not more expensive as I expected

The ice has a reputation of spending. And HIPAA's concern to comply with the relevant tier-tier account of this project. I'm used to worry about running the snow tab.

Finally we had to try hard to spend more than $ 100 / week while training the model. We have run thousands of documents in model every few days to measure its accuracy while installing model, but cannot violate the budget.

It is still better, saves money from the review process. The cost of AI Reviewing 1000 documents (Allow 500 documents) is ~ 20% of the cost we use in reviewing 500.

Condensation

I am impressed by how quickly we can complete the project of this average using the document AI. We went out for months to the days. I give 4 of 5 stars, and I am open to giving it the 5th star if the snow gives us access to photographic egodding.

From the flu, we have sent similar models to other texts that have the same or better results. And through all the PrEP work, instead of a fear of future flu time, we are ready to bring.

Source link

nimda April 15, 2025

0 10 6 minutes read