Ukuqhathanisa amamodeli / ama-Optical Character Recond) amamodeli / amasistimu ngo-2025

Ukuqashelwa komlingiswa we-Optical kusuke kuvela ku-Plain Text Extraction ku-Press intelligence. Izinhlelo zanamuhla kufanele zifunde ukuskena kanye nama-PDF edijithali ngokudlula okukodwa, gcina ukwakheka, thola amatafula, akhiphe amanani ababili, futhi asebenze ngolunye ulimi. Amaqembu amaningi manje afuna futhi i-OCR engondla amapayipi we-rag ne-ejenti ngqo. Ngo-2025, amasistimu ayi-6 amboza imithwalo yangempela yomsebenzi wangempela:
- I-Google Cloud Preen Ai, I-Enterprise Donser OCR
- I-Amazon Textract
- I-Microsoft Azure ai press integraph intect intect intelligence
- Injini ye-abbyny finereareader kanye ne-flexicapture
- I-PaddlerCOCC 3.0
- I-Deepseek OCR, Izimo Zokucindezela Optical
Inhloso yalokhu kuqhathanisa akuyona ukuwabheka ku-metric eyodwa, ngoba ahlose izingqinamba ezihlukile. Umgomo ukukhombisa ukuthi yiluphi uhlelo oluzolusebenzisa ivolumu yedokhumenti elinikeziwe, imodeli yokuhambisa, isethi yolimi, ne-Downstream AI Stack.

Ubukhulu bokuhlola
Siqhathanisa ngezilinganiso ezi-6 ezizinzile:
- Ikhwalithi ye-OCR eyinhloko Kuskenwa, kuthathwe izithombe nama-pdfs wedijithali.
- Ukwakheka kanye nesakhiwo Amatafula, ama-keli asemqoka ngababili, amamaki wokukhetha, i-oda lokufunda.
- Ukumbozwa kolimi nokubhala ngesandla.
- Imodeli yokuhambisa Iphethwe ngokuphelele, isitsha, ezakhiweni, ze-self zisingathwe.
- Ukuhlanganiswa ukuba kuphelele Nge-LLM, Rag ne-IDP Amathuluzi.
- Izindleko ezikalini.
1. Idokhumenti ye-Google Cloud AI, I-Enterprise Donser OCR
Idokhumenti yebhizinisi le-Google i-OCR ithatha ama-PDF nezithombe, noma ngabe iskene noma idijithali, futhi ibuyisa umbhalo ngesakhiwo, amatafula, amanani asemqoka ngababili kanye namamaki wokukhetha kanye namamaki okukhethwayo. Iphinde iveze ukuqashelwa kokubhala ngesandla ngezilimi ezingama-50 futhi ingathola isitayela sezibalo nefonti. Lezi zindaba zezitatimende zezezimali, izinhlobo zokufundisa kanye nemilando. Ukukhishwa kuhlelwe i-json ehlelwe ukuthi ingathunyelwa ku-vertex ai noma noma yiluphi uhlelo lwe-rag.
Amandla
- I-OCR ephezulu ye-OCR kumadokhumenti ebhizinisi.
- Igrafu yokuhlelwa okuqinile nokutholwa kwetafula.
- Ipayipi elilodwa lama-PDF edijithali futhi askena, agcina ukungenisa elula.
- Ibanga le-Enterprise, nge-IAM ne-Data Residency.
Ubungcele
- Kuyinsizakalo yefu le-metered google.
- Izinhlobo zemibhalo ngokwezifiso zisadinga ukucushwa.
Sebenzisa lapho Idatha yakho isivele iku-Google Cloud noma lapho kufanele ulondoloze ukwakheka kwesigaba sakamuva se-LLM.
I-Textract ihlinzeka ngemigwaqo emibili ye-API, ivumelanisa amadokhumenti amancane kanye ne-asynchronous ama-PDF amakhulu we-multipage. Ikhipha umbhalo, amatafula, amafomu, amasiginesha futhi ibabuyisele njengamabhulokhi ngobudlelwano. I-Analyzedocumememememememememeume ngo-2025 nayo ingaphendula imibuzo ekhasini elenza lula invoice noma isimangalo. Ukuhlanganiswa ne-S3, i-Lambda kanye nemisebenzi yesinyathelo kwenza kube lula ukuguqula i-textract ibe yipayipi lokungenisa.
Amandla
- Ithebula elinokwethenjelwa kanye nokukhishwa kwenani elibalulekile lamaresidi, ama-invoice kanye namafomu womshuwalense.
- Sula ukuvumelanisa nemodeli yokusebenza kwe-batch.
- Ukuhlanganiswa kwe-AWS okuqinile, okuhle kwe-Service ne-IDP ku-S3.
Ubungcele
- Ikhwalithi yesithombe inomphumela obonakalayo, ngakho-ke ukulayishwa kwekhamera kungadinga ukufezeka.
- Ukwenza ngokwezifiso kukhawulelwe kakhulu kunamamodeli we-azure ngokwezifiso.
- Kukhiywe ku-AWS.
Sebenzisa lapho Ukulayisha komsebenzi sekuvele ku-AWS futhi udinga i-JSON ehlelekile ukuphuma ebhokisini.
3. I-Microsoft Azure ai press integraph intect integrance
Inkonzo ka-Azure, eqanjwe kabusha kusuka kuFomu Loolizer, ihlanganisa i-OCR, ukwakheka okujwayelekile, amamodeli akhethiwe kanye namamodeli we-neural noma wethempulethi. Ukukhishwa kwemali okungeziwe kwe-2025 futhi ufunde iziqukathi, ngakho-ke amabhizinisi angasebenzisa imodeli efanayo ezakhiweni. Imodeli yesakhiwo ikhipha umbhalo, amatafula, amamaki okukhetha nesakhiwo sedokhumenti futhi yakhelwe ukucubungula okwengeziwe nge-LLMS.
Amandla
- Kuhle kakhulu kumamodeli wedokhumenti yangokwezifiso yeklasi ngomugqa wamafomu webhizinisi.
- Iziqukathi ze-hybrid kanye nomoya motod geidments.
- Amamodeli angabasebenzi ama-invoice, amaresidi nakumazisi.
- Hlanza okukhipha i-JSON.
Ubungcele
- Ukunemba kwemibhalo engeyona eyesiNgisi kusengaba ngemuva kwe-Abbyy.
- Amanani nokusebenzisa okubonakalayo kumele kuhlelwe ngoba kusewumkhiqizo wokuqala wefu.
Sebenzisa lapho Udinga ukufundisa uhlelo izifanekiso zakho noma uma uyisitolo seMicrosoft esifuna imodeli efanayo e-Azure nasezakhiweni.
4. Injini ye-abbyny finerearer kanye ne-flexicapture
U-Abbyby uhlala efanele ngo-2025 ngenxa yezinto ezi-3, ukunemba emibhalweni ephrintiwe, ukuhlanganiswa kolimi kakhulu, kanye nokulawula okujulile kokuqanjwa kokwenziwa kwangaphambili nokugoqa. Injini yamanje kanye nemikhiqizo yeFlexicapture Ukuxhaswa kwezilimi ezingama-190 nezinye izindlela, idatha ehleliwe, futhi ingafakwa ku-Windows, i-Linux kanye ne-VM. U-Abbyy futhi unamandla emikhakheni elungisiwe lapho idatha ingakwazi ukuphuma khona ezakhiweni.
Amandla
- Ikhwalithi ephezulu kakhulu yokuqashelwa kwizinkontileka ezihloliwe, amaphasiphothi, amadokhumenti amadala.
- Ulimi olukhulu kakhulu olusethwe kulokhu kuqhathanisa.
- I-Flexicapture ingahle ihlelwe kumadokhumenti ahlangene aphikisayo.
- Ama-sdk avuthiwe.
Ubungcele
- Izindleko zelayisense ziphakeme kunomthombo ovulekile.
- Umbhalo wesehlakalo sokufunda esijulile akusona ukugxila.
- Ukukala kumakhulu ama-node kudinga ubunjiniyela.
Sebenzisa lapho Kufanele ugijime ezakhiweni, kufanele ucubungule izilimi eziningi, noma kufanele udlulise ukucwaningwa kokutholwa.
I-5. I-PADDERCOCC 3.0
I-PaddleDlecc 3.0 iyithuluzi elivulekile le-Apache elivulekile le-Apache elihlose ukuvala izithombe nama-PDF ku-LLM Idatha Ehlelekile. Kuthunyelwa nge-PP OCRV5 yokuqashelwa kwezilimi eziningi, i-PP ehlelekile ye-PROBRODICE YOKUFUNDA KANYE NOKUXHUMANISA ITHEBELA, kanye ne-PP ChatVVVVVE yokwakhiwa kwemininingwane ebalulekile. Ixhasa izilimi eziyi-100 zokuhlanganisa, isebenza ku-CPU ne-GPU, futhi inezinhlobonhlobo zeselula nezomkhawulo.
Amandla
- Imahhala futhi ivulekile, akukho zindleko zekhasi ngalinye.
- Ngokushesha ku-GPU, esebenzisekayo emaphethelweni.
- Ihlanganisa ukutholwa, ukuqashelwa kanye nokwakheka kuphrojekthi eyodwa.
- Umphakathi osebenzayo.
Ubungcele
- Kufanele usebenzise, uqaphe futhi uvuselele.
- Ngezakhiwo zase-Europe noma zezezimali uvame ukudinga ukuhambisa kabusha noma ukuhleleka okuhle.
- Ezokuphepha nokuqina zingumsebenzi wakho.
Sebenzisa lapho Ufuna ukulawulwa okugcwele, noma ufuna ukwakha insizakalo ye-self ehlolwe idokhumenti yezobunhloli ye-LLM Rag.
I-6. I-Expseek OCR, izimo zokucindezela
Kukhishwe i-Occenseek OCOCTOBER 2025. Akuyona i-OCR yakudala. Kuyimodeli yolimi lwe-LLM Centeric Vision yolimi olucindezela umbhalo omude nemibhalo ezithombeni eziphezulu zokuxazulula, bese kubahlukanisa. Ikhadi lemodeli yomphakathi kanye ne-blog kubikwa cishe amaphesenti angama-97 okuqinisa ukunemba ngezikhathi eziyi-10 ukungqubuzana kanye namaphesenti angama-60 ngezikhathi ezingama-20 ukucindezela. Kunelayisense ye-MIT, yakhelwe ezungeze i-decoder ye-3B, futhi esele isekelwe ku-vllm futhi ibambe ubuso. Lokhu kwenza kube mnandi kumaqembu afuna ukunciphisa izindleko zethokheni ngaphambi kokubiza i-LLM.
Amandla
- I-Self ibanjelwe, i-GPU isilungile.
- Kuhle kakhulu kumongo omude kanye namatafula ahlanganisiwe ahlanganisiwe ngoba ukucindezelwa kwenzeka ngaphambi kokuqunjelwa.
- Ilayisense elivulekile.
- Ifanelana nezitabane ze-agentic zesimanje.
Ubungcele
- Alukho uphawu lomphakathi olujwayelekile okwamanje olubeka ku-Google noma ama-AWS, ngakho-ke amabhizinisi kumele aqhubeke nokuhlolwa kwawo.
- Idinga i-GPU nge-vram eyanele.
- Ukunemba kuncike ekulinganiseni okukhethiwe kokucindezelwa.
Sebenzisa lapho Ufuna i-OCR eyenzelwe amapayipi e-LLM kunokuba i-Archive Digitization.
Ukubhekisa ukuqhathanisa ikhanda
| Ubuso | I-Google Cloud Preen Ai (I-Enterprise Donser OCR) | I-Amazon Textract | I-Azure ai press intelligence | Injini ye-abbyny finerearer / flexicapture | I-PaddlerCOCC 3.0 | I-Deepseek OCR |
|---|---|---|---|---|---|---|
| Umsebenzi oyisisekelo | I-OCR yokuskena kanye nama-pdfs wedijithali, ibuyisela umbhalo, ukwakheka, amatafula, ama-KVP, amamaki wokukhetha | I-OCR yombhalo, amatafula, amafomu, omazisi, ama-invoice, amaresidi, nge-sync ne-async apis | Amamodeli we-OCR kanye namamodeli wangokwezifiso, ukwakheka, iziqukathi zezakhiwo | Ukunemba okunemba okuphezulu kwe-OCR nokuthwebula amadokhumenti amakhulu, izilimi eziningi, ezindaweni zomsebenzi | Umthombo ovulekile we-OCR ne-PRODOD DORSITING, PP OCRV5, PP I-PASPROCTV3, PP Chatokv4 | I-LLM Centric OCR ecindezela izithombe zedokhumenti futhi ibavimbele ukungcola isikhathi eside ai |
| Umbhalo nokwakheka | Amabhlogo, izigaba, imigqa, amagama, izimpawu, amatafula, ngababili abasezingeni eliphezulu, amamaki wokukhetha | Umbhalo, ubudlelwano, amatafula, amafomu, izimpendulo zombuzo, ukuhlaziywa kokuboleka | Umbhalo, amatafula, i-KVP, amamaki wokukhetha, ukukhishwa kwesibalo, i-Json ehlelekile, imodeli ye-V4 Layout | Ukugqwayiza, amatafula, amasimu efomu, ukuhlukaniswa nge-flexicapture | Amatafula wesakhiwo nokwakha kabusha amatavu kanye nesikhundla sedokhumenti, amamojula we-kie ayatholakala | Yenza kabusha okuqukethwe ngemuva kokucindezelwa okubonakalayo, okuhle amakhasi amade, kudinga ukuhlolwa kwendawo |
| Ukuloba ngesandla | Ephrintiwe futhi ukubhala ngesandla kwezilimi ezingama-50 | Ukubhala ngesandla ngamafomu nombhalo wamahhala | Umbhalo wesandla osekelwe kumamodeli wokufunda nokwakheka | Ephrintiwe eqine kakhulu, ukubhala ngesandla okutholakala ngezifanekiso zokuthwebula | Kusekelwa, kungadinga ukushukunyiswa kwesizinda | Kuya ngesilinganiso sesithombe nokucindezelwa, akukabikwa |
| Izilimingqi | Izilimi ezingama-200 + OCR, izilimi ezingama-50 zokubhala ngesandla | Izilimi eziyinhloko zebhizinisi, ama-invoice, ama-ID, amaresidi | Izilimi ezinkulu Zebhizinisi, Ukunwebeka ku-V4.x | Izilimi eziyi-190- 201 Kuya ngohlobo, ububanzi kuleli thebula | Izilimi eziyi-100 + ku-V3.0 Stack | Izilimi eziningi nge-VLM decoder, ukumboza okuhle kodwa hhayi ngokuphelele ukushicilelwa, kuhlolwe ngephrojekthi ngayinye |
| Ukuthunyelwa | Kuphethwe ifu le-Google eliphelele | AWS ephethwe ngokugcwele, imisebenzi evumelanayo ne-asynchronous | I-APPED AZURE Service Plus Plus and Lakeut Iziqukathi (2025) ezakhiweni | Ezakhiweni, i-VM, ifu lekhasimende, i-SDK centric | I-Self Ibanjelwe, i-CPU, GPU, Edge, Iselula | I-Self ibanjelwe, i-GPU, vllm isilungile, Ilayisense yokuqinisekisa |
| Indlela yokuhlanganisa | Ukuthumela ngaphandle i-JSON i-JSON to Vertex AI, Greequery, Rag Pipelines | Obomdabu ku-S3, Lambda, Yesinyathelo Imisebenzi, AWS IDP | I-Azure ai studio, izinhlelo zokusebenza ezine-logic, ama-aks, amamodeli wangokwezifiso, iziqukathi | I-BPM, i-RPA, i-ECM, amapulatifomu e-IDP | Amapayipi ePython, izitaki ezivulekile ze-rag, izinsizakalo zedokhumenti yangokwezifiso | I-LLM ne-Agent Stacks efuna ukunciphisa amathokheni kuqala, i-VLLM ne-HF asekelwe |
| Imodeli yezindleko | Khokha ngamakhasi ayi-1 000, izaphulelo zevolumu | Khokha ikhasi ngalinye noma idokhumenti, ukukhokhisa ama-AWS | Ukusetshenziswa okusekelwe, amalayisense esitsheni sama-run wendawo | Ilayisense yokuhweba, iseva ngayinye noma ngevolumu ngayinye | Mahhala, infra kuphela | I-repo yamahhala, izindleko ze-GPU, ilayisense lokuqinisekisa |
| Kufanelekile | Kuskena okuxubile kanye nama-PDF edijithali ku-Google Cloud, i-Layout egciniwe | Ukufakwa kwe-AWS kwama-invoice, amaresidi, amaphakheji wemalimboleko esikalini | Izitolo zeMicrosoft ezidinga amamodeli wangokwezifiso kanye ne-hybrid | Kulawulwa, izilimi eziningi, ekwakhiweni kwezakhiwo | I-Self Host Idokhunte yedokhumenti ye-LLM ne-Rag | Amapayipi amade we-LLM amapayipi adinga ukucindezelwa okubonakalayo |
Yini okufanele uyisebenzise lapho
- I-IDP yamafu kuma-invoice, amaresidi, amafomu ezokwelashwa: I-Amazon Textract noma i-Azure Public Intelligence.
- Ukuskena okuxubile kanye nama-PDF edijithali wamabhange kanye nama-telcos ku-Google Cloud: Idokhumenti ye-Google Playgrise AI Enterprise OCR.
- Ukulondolozwa kukahulumeni noma umshicileli ngezilimi eziyi-150 kanye nefu: Injini ye-abbyny finerearer kanye ne-flexicapture.
- Inkampani yokuqalisa noma yabezindaba yakha i-rag yayo nge-PDFS: I-PaddlerCOCC 3.0.
- Ipulatifomu le-LLM elifuna ukuncipha umongo ngaphambi kokuphaphana: I-Deepseek OCR.
I-Google Donswation AI, i-Amazon Tescract, kanye ne-Azure Ai Press intelligence konke ukuletha ukuhlelwa kwe-OCR ngamatafula, kanti amamaki we-Flexicapy akhipha kalula e-XML kanye ne-New JSON ROFTER NOKWENZA LAMANDLA. I-PaddleDOCCC 3.0 ihlinzeka nge-Apache Ilayisensi ye-PP OCRV5, i-PP ehlelekile ye-Procurev3, ne-PP Chatokv4 yokuphangwa kwedokhumenti okusingathwa. I-Deepseek OCR ibika ama-97% ukucacisa ukucacisa okungaphansi kwe-10x Compression kanye nama-60% ku-20x, ngakho-ke amabhizinisi kumele asebenze amabhentshi endawo ngaphambi kokukhishwa kwemisebenzi yokukhiqiza. Sekukonke, i-OCR ngo-2025 iyinhloli yedokhumenti yokuqala, ukuqashelwa okwesibili.
Izinkomba:

UMichal Sutter ungumsebenzi wesayensi yedatha ene-Master of Science ku-Data Science evela e-University of PADOVA. Ngesisekelo esiqinile ekuhlaziyeni kwezibalo, ukufunda ngomshini, kanye nobunjiniyela bedatha, ama-Mikhali ama-Excels ekuguquleni imininingwane eyinkimbinkimbi ekutholeni okusebenzayo.
Landela uMarktechpost: Sengeze njengomthombo owuthandayo ku-Google.



