Machine Learning

IGaia: I-Agent ejenti ye-agent wonke umuntu ekhuluma ngayo

benza izihloko ngesonto eledlule.

Ngokwakhiwa kwe-Microsoft 2025, u-CEO Satya Nadella wethula umbono we-agentic “evulekile ye-agentic” futhi wakhombisa i-Gitkub Copilot entsha ekhonza njengeqembu lesivumelwano se-ejenti eliningi elinikezwe amandla yi-Azure AI enikezwe amandla.

I / O 2025 I / O 2025 ilandelwe ngokushesha nge-agentic AI emisha: Imodi entsha ye-Agent eGemini 2.5, i-beta evulekile yeJules ye-Coding Assistant, kanye nokuxhaswa kwendabuko kwesimo somongo esibushelelezi.

I-Openai ayihlali, noma. Bathuthukise i-opharetha yabo, i-ejenti yokupheqa ngewebhu, kwimodeli entsha ye-O3, eletha ukuzimela okuningi, ukucabanga, kanye nokwazisa ngokuqukethwe kwemisebenzi yansuku zonke.

Kuzo zonke izimemezelo, igama elingukhiye elilodwa liyaqhubeka nokuphuma: Uhlobo lwenkani. Wonke umuntu ubonakala egijimisana ukubika izikolo zabo ze-gaia, kodwa empeleni uyazi ukuthi kuyini?

Uma ufuna ukwazi ukuthi ufunde kabanzi mayelana nokuthi yini ngemuva kwe-Gaia Scores, usendaweni efanele. Kule bhulogi, ake usivule i-Gaia Benchmark bese sixoxa ngokuthi siyini, sisebenza kanjani, nokuthi kungani kufanele unakekele lezo zinombolo lapho ukhetha amathuluzi e-Alm Agent.


1. Ukuhlolwa kwe-Agentic AI: kusuka enkingeni kuya kwisixazululo

Ama-Agents we-LLM asetshenziswa amasistimu we-AI asebenzisa umnyombo ongakwazi ukwenza imisebenzi ngokuzimela ngokuhlanganisa ukuqonda kwezilimi zemvelo, ngokubonisana, ukuhlela, ukuhlela, ukuhlela, ukuhlela, ukuhlela, ukuhlela, ukuhlela, ukuhlela, ukuhlela, ukusetshenziswa kwamathuluzi.

Ngokungafani ne-LLM ejwayelekile, akuzona nje abaphenduli nje abayizibonelo. Esikhundleni salokho, baqala isenzo, bazivumelanisa nomongo, futhi basebenzisane nabantu (noma ngisho nangabanye ama-ejenti) ukuxazulula imisebenzi eyinkimbinkimbi.

Njengoba laba bantu bakhula benekhono ngokwengeziwe, umbuzo obalulekile ngokwemvelo ulandela: Sithola kanjani ukuthi muhle kanjani?

Sidinga ukuhlolwa kokulinganisa okujwayelekile.

Okwesikhashana, umphakathi we-LLM uthembele kumabhentshi abuhle ngokuhlola amakhono athile we-LLM, isib. Ukukhumbula ulwazi Mmluizibalo zokucabanga GSM8KIsizukulwane sekhodi ye-Snippet-Level -Budodanoma ukuqonda okukodwa kolimi -Superlue.

Lezi zivivinyo zibalulekile impela. Kepha nakhu ukubanjwa: Ukuhlola umsizi we-AI ogcwele kuhlangene Umdlalo ohlukile.

Umsizi udinga ukuzimela icebo, juqulafuthi dlala ngaphezulu kwezinyathelo eziningi. Lezi zinqubo ezinamandla, amakhono emhlabeni wangempela kwakungewona ukugxila okuyinhloko kwalawo “akhulile” paradigms.

Lokhu kwaqokomisa igebe: Sidinga indlela yokulinganisa ukuthi bonke ubuhlakani obusebenzayo.

Faka i-GAIA.


2. IGaia ivulwa: Yini engaphansi kwe-hood?

Uhlobo lwenkani imele Izithombeumbhali -Yi AI-SSISISTANS BENCHMOM [1]. Le Benchmark yethulwa ukubheka ngqo ama-llm agents emandleni abo okwenza njengabasizi bezinhloso ezijwayelekile ze-AI. Kungumphumela womzamo wokubambisana ngabacwaningi abavela ku-Meta-Fair, Meta-Genai, ukuqabula ubuso, nabanye obuhlotshaniswa ne-AutoGPTION.

Ukuze uqonde kangcono, ake sihlukane nalesi sibhebhe ngokubheka esakhiweni saso, ukuthi ithola kanjani imiphumela, nokuthi yini eyenza ihluke kwamanye amabhentshi.

2.1 Isakhiwo sikaGaia

IGaia ngokuyisisekelo i-benchmark eqhutshwa imibuzo lapho kunikezwe khona umsebenzi we-LLM ukuxazulula leyo mibuzo. Lokhu kudinga ukuthi bakhombise i-suite ebanzi yamakhono, kufaka phakathi kepha kungagcini ku:

  • Ukucabanga okunengqondo
  • Ukuqonda okuningiliziwe kokuqonda, isib. Ukuhumusha izithombe, idatha eyethulwe kumafomethi angewona umbhalo, njll.
  • Ukuphequlula kweWebhu ngokubuyisa imininingwane
  • Ukusetshenziswa kwamathuluzi wesoftware ahlukahlukene, isib, abahumushi beKhodi, ama-file manipulators, njll.
  • Ukuhlelwa kwamasu
  • Imininingwane yokuhlanganisa kusuka emithonjeni ehlukanisayo

Ake sibheke enye yemibuzo “enzima” yeGaia.

Yiziphi izithelo eziboniswe kumdwebo we-2008 Imfene evela e-Uzbekistan banikezwa njengengxenye ye Ngo-Okthoba 1949 Breakfast Imenyu Okwe-Ocean Liner kamuva esetshenziswe njengeproput entantayo kwifilimu Uhambo lokugcina? Nikeza izinto njengohlu lwe-comma-ezihlukaniswe ngokhefana, uku-oda ngokwewashi kusuka endaweni eyi-12 o'Clock ekupendweni nasekusebenziseni isithelo sezithelo ngasinye.

Ukuxazulula lombuzo uphoqa umenzeli ukuze (1) Yenza ukuqashelwa kwesithombe ukuze ilebula izithelo ekupendweni, (2) Ucwaningo lwefilimu inhlebo ukuze ufunde igama lomkhumbi, (3) Buyisa futhi ujikeleze imenyu yomlando yango-1949, (4) hlanganisa uhlu lwezithelo ezimbili, futhi (5) Fometha impendulo ngqo njengoba uceliwe. Lokhu kukhombisa izinsika eziningi zekhono ekuhambeni okukodwa.

Sekukonke, ibhentshi liqukethe imibuzo engu-466 ekhethiwe. Bahlukaniswe nge Ukuthuthukiswa / Ukuqinisekiswa Kwesethiokungokwesiphakathi, futhi kuyimfihlo Ukuhlolwa Kwesethi yemibuzo engama-300, izimpendulo ezivinjelwe zona zisebenzisa amandla wabaphambili abasemthethweni. Isici esiyingqayizivele se-GAIA ukuthi senzelwe izimpendulo ezingathandeki, eziyiqiniso. Lesi sici senza lula inqubo yokuhlola futhi futhi iqinisekisa ukuvumelana kokushaya amagoli.

Imibuzo yeGaia ihlelwe ngokuya ngamazinga amathathu wobunzima. Umqondo ngemuva kwalomklamo ukuphenya amakhono anzima kakhulu:

  • Izinga 1: Le misebenzi yenzelwe ukuxazululwa yi-LLMS enekhono kakhulu. Ngokuvamile badinga izinyathelo ezingaphansi kwezihlanu okufanele zikuqede futhi zibandakanya ukusetshenziswa kwamathuluzi okuncane.
  • Ileveli 2: Le misebenzi ifuna ukucabanga okuyinkimbinkimbi ngokwengeziwe nokusetshenziswa okufanele kwamathuluzi amaningi. Isixazululo ngokuvamile sihilela phakathi kwezinyathelo ezinhlanu neziyishumi.
  • Ileveli 3: Le misebenzi imele imisebenzi eyinselele kakhulu ngaphakathi kwebhentshi. Ukuphendula ngempumelelo leyo mibuzo kuzodinga ukuhlelwa kwesikhathi eside kanye nokuhlanganiswa okuyinkimbinkimbi kwamathuluzi ahlukahlukene.

Manje njengoba sesizwisisile ukuthi yikuphi ukuhlolwa kweGaia, ake sihlole ukuthi kukala kanjani ukuphumelela.

2.2 Ukushaya kukaGaia

Ukusebenza kwe-LLM Agent kulinganiswa ikakhulukazi ngobukhulu obukhulu obubili, ukuqonda nqo na- khokhisa.

Ngokunemba, lokhu ngokungangabazeki kungukuthi i-metric eyinhloko yokuhlola ukusebenza. Yini ekhethekile nge-GAIA ukuthi i-metric yokunemba imvamisa ayikabikwa njenge- amaphuzu aphelele kuyo yonke imibuzo. Ngokwengeziwe, izikolo ngazinye Ngawo nalawo mazinga wobunzima amathathu nawo kubikwa ukuthi anikeze ukuwohloka okucacile kwamakhono e-ejenti lapho ephatha imibuzo ngobunzima obuhlukahlukene.

Ngezindleko, kulinganiswa e-USD, futhi kukhombisa izindleko eziphelele ze-API ezitholwe yi-ejenti yokuzama yonke imisebenzi kwisethi yokuhlola. I-metric yezindleko ibaluleke kakhulu ekusebenzeni ngoba ihlola i- ikhono lokwenza na- Ukusebenza kwezindleko yokusebenzisa i-ejenti emhlabeni wangempela. Umenzeli ophenya kakhulu osusela izindleko ngokweqile angabingelwa esikalini. Ngokuphambene nalokho, imodeli esebenza ngemali engabizi kakhulu ingahle icashile ekukhiqizweni noma ngabe ifinyelela ukunemba okuphansi kancane.

Ukukunikeza umuzwa ocacile wokuthi yikuphi ukunemba okubukeka kwengqondo empeleni okubukeka sengathi kukhomba, cabanga ngamaphuzu alandelayo alandelayo:

  • Abantu bafeza cishe ukunemba kwama-92% emisebenzini yeGaia.
  • Njengokuqhathanisa, ama-ejenti we-LLM okuqala (anikwe amandla yi-GPT-4 nge-plugin ukwesekwa) aqala ngezikolo ezizungeze i-15%.
  • Ama-ejenti asanda kwenzeka okuphezulu, isib

Lezi zinombolo zibonisa ukuthi zingakanani ama-ejenti athuthukile, kepha futhi abonisa ukuthi i-GAIA inselele kangakanani, ngisho nezinhlelo eziphezulu ze-LLM Agent.

Kepha yini eyenza ubunzima beGaia buzuze kangaka ekuhlolweni amakhono e-ejele emhlabeni wonke?

2.3 Izimiso Zokuqondisa ZikaGaia

Yini eyenza i-gaia ivelele akuyona nje ukuthi inzima; Kungokuthi ubunzima buklanyelwe ngokucophelela Hlola izinhlobo zamakhono ukuthi ama-ejenti adinga ngezimo ezisebenzayo, zomhlaba wangempela. Ngemuva kwalomklamo kuyizimiso ezimbalwa ezibalulekile:

  • Ubunzima bezwe bangempela: Imisebenzi yeGaia inselele ngamabomu. Imvamisa badinga ukucabanga okuningana kwamanye isikhathi, ukuqonda okujwayelekile, kanye nokusetshenziswa kwamathuluzi noma ama-API. Lezo zidingo zibulwa eduze nezinhlobo zama-tasks agents abhekana nezinhlelo zangempela.
  • Ukuhunyushwa komuntu: Noma ngabe le misebenzi ingaba inselele kuma-LLM agents, bahlala beqondakala ngendlela eqondakalayo kubantu. Lokhu kwenza kube lula ngabacwaningi nabenzi bodokotela ukuhlaziya amaphutha nokulandela umkhondo wokuziphatha we-ejenti.
  • Okungelona ukugembula: Ukuthola impendulo efanele kusho ukuthi umenzeli kufanele awuxazulule ngokuphelele umsebenzi, hhayi nje ukuqagela noma usebenzise indlela yokufanisa. IGaia nayo iyadumaza ngokweqile ngokudinga ukulandelwa kokubonisana nokugwema imibuzo ngezimpendulo ezifunwa kalula.
  • Ukulula kokuhlola: Izimpendulo zemibuzo ye-GAIA zenzelwe ukuba zibe emfushane, eyiqiniso, futhi engenangqondo. Lokhu kuvumela ukushaya amagoli okuzenzakalelayo (kanye nenhloso), ngaleyo ndlela ukwenza ukuqhathanisa okuphezulu okuthembekile ngokwengeziwe futhi kube nokwenziwa kabusha.

Ngokuqonda okucacile kwe-GAIA ngaphansi kwe-hood, umbuzo olandelayo uthi: Kufanele sizihumushele kanjani lezi zikolo lapho sibabona emaphepheni okucwaninga, izimemezelo zomkhiqizo, noma ukuqhathanisa komthengisi?

3. Ukubeka izikolo ze-GAIA ukuze zisebenze

Akuzona zonke izikolo ze-GAIA ezidalwe zilingana, futhi izinombolo zekhanda kufanele zithathwe ngengcosana kasawoti. Nazi izinto ezine ezibalulekile okufanele uzigcine engqondweni:

  1. Ibeka phambili imiphumela yokuhlolwa kwangasese. Lapho ubheka ama-Gaia Scores, ukhumbula njalo ukubheka ukuthi izikolo zibalwa kanjani. Kususelwa ekuqinisekisweni komphakathi kusethiwe noma kusethi yangasese? Imibuzo nezimpendulo zesethi yokuqinisekisa iyatholakala kabanzi online. Ngakho-ke kungenzeka kakhulu ukuthi amamodeli angaba ngekhanda “ngesikhathi sokuqeqeshwa kwabo kunokuba athole izixazululo kusuka ekucabangeni kwangempela. Ukuhlolwa kwangasese kusethiwe “ukuhlolwa kwangempela”, ngenkathi isethi yomphakathi ingaphezulu “kokuhlolwa kwencwadi evulekile.”
  2. Bheka okungaphezulu kokunemba okuphelele, ukumba emazingeni obunzima. Ngenkathi amaphuzu okunemba ephelele enikeza umbono ojwayelekile, kuvame ukuba ngcono ukubheka okujulile kokuthi umenzeli wenza kanjani amazinga ahlukene obunzima. Naka ikakhulukazi imisebenzi ye-Level 3, ngoba ukusebenza okuqinile lapho kukhombisa ukuthuthuka okubalulekile kumakhono e-ejenti wokuhlela isikhathi eside kanye nokuhlanganiswa kwamathuluzi okuyinkimbinkimbi kanye nokuhlanganiswa.
  3. Funa izixazululo ezingezenzeki. Njalo ihlose ukukhomba ama-ejenti anikeza ukusebenza okuhle kakhulu kwezindleko ezinikezwe. Sibona inqubekela phambili enkulu lapha. Isibonelo, igrafu yolwazi yakamuva yemicabango (kgot) yokwakha [2] Ungaxazulula kuze kube yimisebenzi engama-57 kusuka kusethi ye-GAIA (165 Inani lemisebenzi) ngezindleko ezicishe zibe ngu- $ 5 nge-GPT-4O Mini, uma kuqhathaniswa nezinguqulo zangaphambilini zokubopha imisebenzi engu-29 nge- $ 187 kusetshenziswa i-GPT-4O.
  4. Qaphela ukungapheleli kwedatha okungenzeka. Cishe i-5% yedatha ye-GAIA (kulo lonke ulwazi lokuqinisekiswa kanye namasethi wokuhlola) liqukethe amaphutha / ambi-ama-aviguities ezifundweni zeqiniso lomhlaba. Ngenkathi lokhu kwenza ukuhlolwa okukhohlisayo, kunama-linline esiliva: Ukuhlola ama-lm agents ngemibuzo ngezimpendulo ezingaphelele kungakhombisa ngokusobala ukuthi yimaphi ama-ejenti abonisana ngempela nokuchithela idatha yabo yokuqeqesha.

4. Isiphetho

Kulokhu okuthunyelwe, sivule i-GAIA, ibhentshi lokuhlola i-ejenti elisheshe libe inketho yokuya endle. Amaphuzu asemqoka okufanele uwakhumbule:

  1. IGaia isheke elingokoqobo labasizi be-AI. Kwenzelwe ngokukhethekile ukuvivinya indawo eyinkimbinkimbi yamakhono e-LLM Agents njengabasizi be-AI. Lawa makhono afaka ukucabanga okuyinkimbinkimbi, ukuphatha izinhlobo ezahlukene zolwazi, ukuphequlula kweWebhu, kanye nokusebenzisa amathuluzi ahlukahlukene ngempumelelo.
  2. Bheka ngalé kwezinombolo ze-headline. Hlola umthombo wokuhlola, ubunzima bokuphuka, kanye nokusebenza kwezindleko.

IGaia imelela isinyathelo esibalulekile ekuhlolweni kwama-lm agents ngendlela esifuna ukuyisebenzisa ngayo: njengoba abasizi abazimele abangaphatha izinselelo ezingcolile, ezinezici eziningi zomhlaba wangempela.

Mhlawumbe izinhlaka ezintsha zokuhlola zizovela, kepha izimiso eziyisisekelo zeGaia, ukuhambisana kwangempela komhlaba, ukuhunyushwa komuntu kanye nokumelana nokudlala kwezokudlala, cishe kuzohlala kunkambiso lapho silinganisa khona ama-Agents.

Ukunqubekela phambili

[1] I-Mialon et al., Gaia: Ibhentshi labasizi abajwayelekile AI, 2023, Arcoviv.

[2] I-Besta et al., Abasizi be-AI abangabizi ngegrafu yemicabango, 2025, arxiv.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button