Machine Learning

Ukwakha i-multimodal rag ephendula ngombhalo, izithombe namatafula avela emithonjeni

Isizukulwane (i-Rag) kube ngolunye lwezicelo zokuqala futhi eziphumelele kakhulu ze-ai ekhiqizayo. Noma kunjalo, bambalwa izingxoxo ezibuyayo izithombe, amatafula, kanye nezibalo kusuka kumadokhumenti omthombo eceleni kwezimpendulo zemibhalo.

Kulokhu okuthunyelwe, ngihlola ukuthi kungani kunzima ukwakha uhlelo oluthembekile, lwe-multimodal rag, ikakhulukazi Imibhalo Eyinkimbinkimbi Njengamaphepha okucwaninga nemibiko yenhlangano – okuvame ukufaka umbhalo ominyene, amafomula, amatafula, namagrafu.

Futhi, lapha ngiveza indlela ye Ipayipi elithuthukisiwe le-multimodal rag Lokho kuhambisa imiphumela engaguquki, esezingeni eliphakeme le-multimodal kulo lonke lolu hlobo lwedokhumenti.

Dataset kanye setup

Ukwenza isibonelo, ngakha isisekelo solwazi esincane se-multimodal ngisebenzisa le mibhalo elandelayo:

  1. Amamodeli we-clip ahlelwe ngokugcwele asebenza kahle abafundi
  2. I-VeccPaint: Ukuhlanganiswa kwe-vendatic okuthuthukile okuthuthukile kusetshenziswa ama-stroke isitayela priory
  3. Isu lokumaketha lezinsizakalo zezezimali: ukulima ngezimali nokucubungula umdumbula, ummbila kanye namaketanga wenani le-plancain eCôte d'Ivoire

Imodeli yolimi esetshenzisiwe GPT-4Okanye nokushumeka engikusebenzisile Ukufaka umbhalo-3-okuncane.

Ukwakhiwa okujwayelekile kwe-multimodal rag

Emthiyori, i-multimodal rag bot kufanele:

  • Amukela umbhalo nesithombe imibuzo.
  • Buyisela emuva umbhalo nesithombe izimpendulo.
  • Buyisa ingqikithi kusuka kwimithombo yomibili kanye nemithombo yesithombe.

Ipayipi elijwayelekile libukeka kanjena:

  1. Incela
  • Ukuphawula & Ukudonswa: Hlukanisa amadokhumenti abe yizingxenye zombhalo bese ukhipha izithombe.
  • Isifinyezo sesithombe: Sebenzisa i-LLM ukukhiqiza izihloko noma izifingqo zesithombe ngasinye.
  • Ukugodlwa kwama-multi-Vector: Dala ukushumeka kwemibhalo ye-chunks, izifingqo zesithombe, futhi ngokuzikhethela ngezinto eziluhlaza izici (isib. Ukusebenzisa isiqeshana).

2. Ukukhomba

  • Isitolo Sokunyakazisa kanye neMetadata database we-vector.

3. Ukubuyisa

  • Ngombuzo womsebenzisi, yenza ukusesha okufanayo:
  • Ukushumeka kombhalo (ukufana kombhalo)
  • Isifinyezo Sokuvinjwa Kwezithombe (ngokuhambisana kwesithombe)

4. Isizukulwane

  • Sebenzisa i-multimodal llm ukuhlanganisa impendulo yokugcina usebenzisa womabili umbhalo nezithombe.

Ukucabanga okungokwemvelo

Le ndlela ithatha lokho I-caption noma isifinyezo sesithombe esikhiqizwe kokuqukethwe kwayo, ngaso sonke isikhathi siqukethe umongo owanele Mayelana nombhalo noma izingqikithi ezivela kudokhumenti, lapho lesi sithombe kungaba impendulo efanelekile.

Emibhalweni yezwe yangempela, lokhu akunjalo.

Isibonelo: Ukulahleka komongo emibikweni yenhlangano

Thatha “Isu Lokukhangisa Lezinsizakalo Zezimali (# 3 kuDataset)” Bika kudathabhethi. Esifingweni sayo esiphezulu, kunamatafula amabili afanayo abukeka ekhombisa Inhloko yokusebenza Izidingo – eyodwa ye Abakhiqizi abayinhloko (abalimi) neyodwa abaxhunyiwe. Yilezi ezilandelayo:

Ithebula le-Capital Capital for abakhiqizi abayinhloko
Ithebula le-Capital Capital for processors

I-GPT-4O yakha okulandelayo kwetafula lokuqala:

“Ithebula lichaza izinhlobo ezahlukahlukene zokusebenza kwezimali ezikhethwa imali ngamabhizinisi ezolimo, kufaka phakathi izinhloso zazo nokutholakala kuzo zonke izimo ezihlukile”

Futhi okulandelayo kwetafula lesibili:

“Ithebula linikeza ngesibukezo sokusebenza kwezimali ezisebenza ngezimali, ichaza izinhloso zayo kanye nokusebenza okungenzeka ngezimo ezahlukahlukene zamabhizinisi, ikakhulukazi abathengi basemasheya”

Kokubili kubonakala kulungile ngawodwana – kepha akukho kuthumba ingqikithi lokho kuhlukanisa abakhiqizi ukusuka abaxhunyiwe.

Lokhu kusho ukuthi bayoba njalo itholwe ngokungalungile Ngemibuzo ebuza ngqo ngabakhiqizi noma ama-processors kuphela. Kunamanye amatafula afana ne-capex, amathuba okuxhasa imali lapho kungenzeka khona umagazini ofanayo.

Ngephepha le-veccpaineter, lapho umkhiki 3 ephepheni abonisa ipayipi le-veccpaine, i-GPT-4O yakha amagama-ncazo njenge “Sibutsetelo ngohlaka oluhlongozwayo lokukhishwa kwesitayela esisuselwa ku-Strike kanye ne-SVG synthersesis enesitayela ngezinkinga zeleveli ye-stroke,” Ulahlekelwe iqiniso lokuthi imelela ingqikithi eyisisekelo yephepha, igama elithi “veccPainter” ngababhali.

Futhi ngokokuzalwa kolimi okufanayo ifomula yokulahleka kwe-distillation echazwe ku-sec 3.3 yephepha lokuhlawulisa isiqeshana, i-caption eyenziwe “I-equation emele ukulahla okuhlukanisayo okuvumayo (i-VLD) ukulahleka, okuchazwe njengesamba se-Kllback-Leibler (KL) ukuphambuka phakathi kokuqagelwa nokuqondisa ukusatshalaliswa kokungena ngemvume ngaphezulu kwe-batch yokufaka.”, lapho umongo wombono nokuxhumeka kwezilimi kutholakala khona.

Kuyakwazi futhi ukuthi kuphawulwe ukuthi emaphepheni okucwaninga, izibalo namatafula anombhali onikezwe amagama anikezwe, noma kunjalo, ngesikhathi senqubo yokukhishwa, lokhu kukhishwe hhayi njengengxenye yesithombe, kodwa njengengxenye yombhalo. Futhi ukuma kwe-caption kwesinye isikhathi kungenhla futhi ngezinye izikhathi ngaphansi kwesibalo. Ngokuqondene necebo lokumaketha libika, amatafula ashumekiwe nezinye izithombe azinakho igama elinamathiselwe elichaza isibalo.

Lokho okungenhla okuboniswe ukuthi amadokhumenti omhlaba wangempela awalandeli noma iyiphi ifomethi ejwayelekile yombhalo, izithombe, amatafula kanye namagama-ncazo, ngaleyo ndlela enza inqubo yokuhlanganisa izibalo ezinzima.

Ipayipi elisha nelithuthukile le-multimodal rag

Ukuxazulula lokhu, ngenza izinguquko ezinkulu ezimbili.

1. Izifingqo zesithombe sazi

Esikhundleni sokubuza i-LLM ukufingqa isithombe, ngikhipha Umbhalo ngokushesha ngaphambi nangemva kwesibalo – Kuze kube izinhlamvu ezingama-200 endaweni ngayinye.
Ngale ndlela, i-turction yesithombe ifaka:

  • Le khasi Umbhali-enikeziwe I-Caption (uma ikhona)
  • Le khasi Ukulandisa okuzungezile lokho kukunikeza okushiwo

Noma ngabe idokhumenti ingenase-ncazo elihlelekile, lokhu kunikeza isifinyezo esinembile esingokomqondo.

2. Ukukhethwa kombhalo okuqondiswa umbhalo okuqondisiwe ngesikhathi sokukhiqizwa

Ngesikhathi sokubuyisa, mina ningenzi Qondanisa umbuzo womsebenzisi ngqo ngezithombe zesithombe. Lokhu kungenxa yokuthi umbuzo womsebenzisi uvame ukumfushane kakhulu ukuhlinzeka ngomongo owanele wokubuyiselwa kwesithombe (isib. Yini …?)
Esikhundleni salokho:

  • Okokuqala, khiqiza Impendulo ebhaliwe Kusetshenziswa imibhalo ephezulu ye-chunks etholwe ngomongo.
  • Ngemuva kwalokho, Khetha izithombe ezimbili ezinhle kakhuluUkuze impendulo yombhalo ifaniswe nesihloko sezithombe

Lokhu kuqinisekisa ukuthi izithombe zokugcina zikhethiwe maqondana nempendulo yangempela hhayi umbuzo wedwa.

Nansi umdwebo we Ukukhishwa ukushumeka Iphayiphu:

Ukukhishwa ukushumeka ipayipi

Nepayipi le Ukubuyiselwa kwemali nokuphendulakungokulandelayo:

Ukubuyiselwa kwemali nokuphendula

Imininingwane yokuqalisa

Isinyathelo 1: Khipha umbhalo nezithombe

Sebenzisa i-Adobe PDF ekhiphe i-API ukuze uzungeze ama-PDF ku:

  • Amanani / namatafula / amafolda ngamafayela we-.png
  • Ifayela le-Ahleddata.json eliqukethe izikhundla, umbhalo kanye nezindlela zefayela

Ngithole le api ukuthi ithembeke kakhulu kunemibala yolwazi efana ne-Pymupdd, ikakhulukazi yokukhipha amafomula nemidwebo.

Isinyathelo 2: Dala ifayela lombhalo

Consate zonke izinto zombhalo ezivela ku-JSON ukudala i-corpus yombhalo eluhlaza:

# Extract text, sorted by Page and vertical order (Bounds[1])
elements = data.get("elements", [])
# Concatenate text
all_text = []
for el in elements:
  if "Text" in el:
    all_text.append(el["Text"].strip())
    final_text = "n".join(all_text)

Isinyathelo 3: Yakha izihloko zesithombe : Hamba ngento ngayinye ye- `ehlelekile.json`, hlola ukuthi ifayela le-elemento liphela ngo-` .png`. Layisha ifayela kusuka kumanani namathebula wetafula ledokhumenti, bese usebenzisa i-LLM ukwenza isheke lekhwalithi esithombeni. Lokhu kuyadingeka njengoba inqubo yokukhishwa izothola izithombe ezithile ezingekho emthethweni, ezincane, unhlokweni kanye nonyawo, ama-logo enkampani njll.

Qaphela ukuthi asibuzi i-LLM ukuthi ihumushe izithombe; Vele uphawule uma kucacile futhi kufanelekile ngokwanele ukufakwa kwi-database. I-Prompt ye-LLM izoba fana:

Analyse the given image for quality, clarity, size etc. Is it a good quality image that can be used for further processing ? The images that we consider good quality are tables of facts and figures, scientific images, formulae, everyday objects and scenes etc. Images of poor quality would be any company logo or any image that is illegible, small, faint and in general would not look good in a response to a user query.
Answer with a simple Good or Poor. Do not be verbose

Ngokulandelayo sakha isifinyezo sesithombe. Kulokhu, ku- `ehlelekile.json`, sibheka izinto ezingemuva kwe-` .png` element, futhi siqoqe izinhlamvu ezingama-200 esiqondisweni ngasinye sezinhlamvu ezingama-400. Lokhu kwakha amagama-ncazo lezithombe noma isifinyezo. I-Code Snippet imi kanjena:

# Collect before
j = i - 1
while j >= 0 and len(text_before) < 200:
  if "Text" in elements[j] and not ("Table" in elements[j]["Path"] or "Figure" in elements[j]["Path"]):
    text_before = elements[j]["Text"].strip() + " " + text_before
    j -= 1
    text_before = text_before[-200:]
# Collect after
k = i + 1
while k < len(elements) and len(text_after) < 200:
  if "Text" in elements[k]:
    text_after += " " + elements[k]["Text"].strip()
    k += 1
    text_after = text_after[:200]

Senza lokhu ngesibalo ngasinye netafula kuyo yonke imibhalo ku-database yethu, bese sigcina izihloko zesithombe njenge-metadata. Endabeni yami, ngigcina njenge- `mage_capted.json` file.

Lolu shintsho olulula lwenza a Umehluko omkhulu– Amagama aphumela avela kufaka umongo onengqondo. Isibonelo, amaphilisi engiwatholayo amatafula amabili asebenzayo asebenza embikweni ye-Marketing Strategy alandelayo. Qaphela ukuthi izimo manje sezinjani manje kuhlukaniswe ngokucacilefuthi faka abalimi kanye nabaprosesa.

"caption": "o farmers for their capital expenditure needs as well as for their working capital needs. The table below shows the different products that would be relevant for the small, medium, and large farmers. Working Capital Input Financing For purchase of farm inputs and labour Yes Yes Yes Contracted Crop Loan* For purchase of inputs for farmers contracted by reputable buyers Yes Yes Yes Structured Loan"
"caption": "producers and their buyers b)t Potential Loan products at the processing level At the processing level, the products that would be relevant to the small scale and the medium_large processors include Working Capital Invoice discounting_ Factoring Financing working capital requirements by use of accounts receivable as collateral for a loan Maybe Yes Warehouse receipt-financing Financing working ca"

Isinyathelo 4: Umbhalo we-chunk ukhiqize ukushumeka

Ifayela lombhalo ledokhumenti lihlukaniswe ngama-chunks ezinhlamvu eziyi-1000, usebenzisa `Recsususuracteckaractectexpletter` kusuka `I-Langchain`futhi igcinwe. Ukushumeddings okwenzelwe ama-chunks wombhalo kanye namagama wesithombe, ajwayelekile futhi agcinwe njenge-`I-Faiss`Izikhombi

Isinyathelo 5: Ukubuyiselwa Komongo nokuphendula

Umbuzo womsebenzisi ufana futhi nemibhalo emi-5 ephezulu yombhalo itholwa njengomongo. Ngemuva kwalokho sisebenzisa la ma-chunks abuyiselwe kanye nombuzo womsebenzisi ukuthola impendulo yombhalo usebenzisa i-LLM.

Esinyathelweni esilandelayo, sithatha impendulo yombhalo ekhiqizwayo futhi sithole imidlalo emi-2 ephezulu kakhulu yesithombe (esekelwe kuma-memeddings we-caption) impendulo. Lokhu kwehlukile ngendlela yendabuko yokufanisa umbuzo womsebenzisi esithombeni esishumekekile futhi inikeze imiphumela engcono kakhulu.

Kunesinyathelo esisodwa sokugcina. Izihloko zethu zesithombe zazisuselwa ezinhlotsheni ezingama-400 ezizungeze isithombe kudokhumenti, futhi kungenzeka kungabi isihloko esinengqondo nesifushane sokuboniswa. Ngakho-ke, kwizithombe zokugcina ezikhethiwe ezi-2, sicela i-LLM ukuthi ithathe izihloko zesithombe kanye nezithombe bese zidala igama elifushane elilungele ukubonisa impendulo yokugcina.

Nansi ikhodi ye-Logic engenhla:

# Retrieve context
result = retrieve_context_with_images_from_chunks(
user_input,
content_chunks_json_path,
faiss_index_path,
top_k=5,
text_only_flag= True
)
text_results = result.get("top_chunks", [])
# Construct prompts
payload_1 = construct_prompt_text_only (user_input, text_results)
# Collect responses (synchronously for tool)
assistant_text, caption_text = "", ""
for chunk in call_gpt_stream(payload_1):
  assistant_text += chunk
  lst_final_images = retrieve_top_images (assistant_text, caption_faiss_index_path, captions_json_path, top_n=2)
if len(lst_final_images) > 0:
  payload = construct_img_caption (lst_final_images)
for chunk in call_gpt_stream(payload):
  caption_text += chunk
response = {
"answer": assistant_text + ("nn" + caption_text if caption_text else ""),
"images": [x['image_name'] for x in lst_final_images],
}
return response

Imiphumela yokuhlola

Masigijime imibuzo okukhulunywe ngayo ekuqaleni kwaleli bhulogi ukubona ukuthi izithombe zibuyiselwe yini zihambelana nombuzo womsebenzisi. Ukuze kube lula, ngiphrinta kuphela izithombe namazwibela abo abonisiwe hhayi impendulo yombhalo.

Umbuzo 1: Yini imalimboleko yemalimboleko yemalimboleko yomkhiqizi oyinhloko?

Umdwebo 1: Ukubuka konke kokusebenza kwezimali ezisebenza ngezimali ezisebenza ngezimali zabalimi abancane, abaphakathi nabakhulu.

Umdwebo 2: Izinketho zezindleko zezimali ezikhokhelwa imali yabalimi abaphakathi nabakhulu.

Umphumela wesithombe wombuzo 1

Umbuzo 2: Yini imalimboleko yokubolekwa kwemali mboleko nokusebenza kwe-capital capital of processors?

Umdwebo 1: Ukubuka konke kwemikhiqizo yemalimboleko yokusebenza yemali esebenza imali yama-procesdors amancane naphakathi nendawo.
Umdwebo 2: Imikhiqizo yemalimboleko yemalimboleko yokuthenga kwemishini kanye nokunwetshwa kwebhizinisi ezingeni lokucubungula.

Umphumela Wemifanekiso Umbuzo 2

Umbuzo 3 : Uyini Ulimi Lombukiso Wokulingisa?

Umdwebo 1: I-Vision-Ulimi Ukufana Ifomula yokulahleka kokudlulisela ukuvumelana okuphathelene neziqeshana eziqeqeshelwe ngaphambilini kumamodeli ahlelwe kahle.

Umdwebo 2: Umsebenzi wokugcina wenhloso ehlanganisa ukulahleka kwe-distillation, ukulahleka okuhlukile kokulahlekelwa, nokufana kokubuka ulimi lokudideka ngokulinganisa okuhambisana ne-hyperpareters yokulinganisa.

Ukubuyiselwa Kwefomula Yemibuzo 3

Umbuzo 4 : Yini ipayipi le-vectaine?

Umdwebo 1: Ukubuka konke kwesitebhisi sohlangothi isitayela nenqubo ye-SVG Synthesisis, ukugqamisa ukuhlukaniswa kwesifo sohlangothi, ukulahleka kwesitayela, kanye nesizukulwane esisekela umbhalo.

Umdwebo 2: Ukuqhathaniswa kwezindlela ezahlukahlukene zokudluliswa kwesitayela kuwo wonke amafomethi we-raster kanye ne-vector, ekhombisa ukusebenza ngendlela ehlongozwayo ekugcinweni kokuguquguqukayo kwe-stylistic.

Ukubuyiselwa Kwezithombe Kwemibuzo 4

Ukugcina

Le pipeline ethuthukisiwe ikhombisa ukuthi kanjani Isifinyezo sesithombe sazi na- Ukukhethwa Kwezithombe Zokuphendula Umbhaloingathuthukisa kakhulu ukunemba kwe-multimodal retrieval.

Indlela ikhiqiza izimpendulo ezicebile, ze-multimodalLokho kuhlanganisa umbhalo nokubukwayo ngendlela ehambisanayo – kubalulekile kubasizi bocwaningo, izinhlelo zezobunhloli zedokhumenti, kanye ne-ai-Powered Knowledge bots.

Try it out… leave your comments and connect with me at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Izinsizakusebenza

1. Amamodeli we-clip ahlelwe ngokugcwele asebenza kahle abafundi: Muchui Liu, Bozheng Li, Yunlong Yu Zhejiang University

2. I-VeccPaint: Ukuhlanganiswa kwe-vendatic okuthuthukile okuthuthukile kusetshenziswa ama-stroke isitayela priors: Juncheng Hu, Xing Xing, Jing Zhang, Qian Yu † Beihang University

3. Isu lokumaketha lezinsizakalo zezezimali: ukulima ngezimali nokucubungula umdumbula, ummbila kanye namaketanga wenani le-plancain eCôte d'Ivoire ukusuka

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button