Ucwaningo lwe-Google AI Luhlongoza I-Vantage: Iphrothokholi Esekelwe ku-LLM Yokulinganisa Ukubambisana, Ukudala, Nokucabanga Okubalulekile

Ukuhlolwa okujwayelekile kungakutshela ukuthi umfundi uyayazi yini i-calculus noma angakwazi ukuhlaziya indinyana yombhalo. Abangakwazi ukukutshela ngokuthembekile ukuthi ingabe lowo mfundi angakwazi yini ukuxazulula ukungaboni ngaso linye nozakwabo, akhiqize imibono yangempela ngaphansi kwengcindezi, noma ahlakaze ngokujulile ingxabano enephutha. Lawa amakhono abizwa ngokuthi amakhono ahlala isikhathi eside – ukusebenzisana, ukusungula izinto, nokucabanga okujulile – futhi amashumi eminyaka baye bamelana nokulinganisa okuqinile, okuyingozi. Ucwaningo olusha oluvela ku-Google Research luphakamisa isisombululo esinobuchwepheshe esibizwa ngokuthi i-Vantage: amamodeli olimi amakhulu angakwazi ukulingisa ukusebenzisana kweqembu okuyiqiniso futhi athole imiphumela ngokunemba abalingani bochwepheshe babantu.

Inkinga Eyinhloko: Ukufaneleka Kwemvelo vs. Psychometric Rigor
Ukuze uqonde ukuthi kungani lokhu kuthakazelisa ngokobuchwepheshe, kusiza ukuqonda indida yokulinganisa ithimba labacwaningi ebelizama ukuhlephula. Ukulinganisa amakhono ahlala isikhathi eside ngempumelelo kudinga izici ezimbili ezingqubuzanayo. Ngakolunye uhlangothi, ukuhlola kudinga ukuba semthethweni kwezemvelo – kufanele kuzwakale njengesimo somhlaba wangempela, ngoba lokho yiwona kanye umongo lapho lawa makhono asetshenziswa khona. Ngakolunye uhlangothi, idinga ukuqina kwe-psychometric: izimo ezijwayelekile, ukukhiqiza kabusha, kanye nezisusa ezilawulekayo ukuze amaphuzu aqhathaniswe kubo bonke abathatha isivivinyo.
Imizamo emikhulu yangaphambilini, njengokuhlola kokuxazulula izinkinga ngokuhlanganyela kwe-PISA 2015, izamile ukuxazulula lokhu ngokuthi izifundo zihlanganyele nabalingani beqembu abalingisa umbhalo ngemibuzo yokukhetha okuningi. Lokho kuqinisekisa ukulawula kodwa kudela ubuqiniso. Ukuhlola komuntu kuya komuntu kwenza okuphambene. Ama-LLM, ithimba labacwaningi liyaphikisana, abekwe endaweni ehlukile ukuze anelise zombili izimfuneko ngesikhathi esisodwa – angakhiqiza ukusebenzelana kwengxoxo okungokwemvelo, okuvulekile kuyilapho eqondiswa ngokohlelo emigomeni ethile yokuhlola.
I-LLM Ephezulu: Isendlalelo Sokuxhumanisa Ngaphezu kwama-AI Agents
Umnikelo ohluke kakhulu ngokobuchwepheshe walolu cwaningo yi-Executive LLM architecture. Kunokuba kuveze ama-ejenti amaningi we-LLM azimele – oyedwa ngozakwethu we-AI – uhlelo lusebenzisa i-LLM eyodwa ukuze lukhiqize izimpendulo zabo bonke ababambiqhaza be-AI engxoxweni. Lokhu kubalulekile ngenxa yezizathu ezimbili.
Okokuqala, inika amandla ukuxhumana. I-Executive LLM inokufinyelela kurubhrikhi yokufundisa efanayo ezosetshenziswa kamuva ukuhlola umuntu obambe iqhaza. Isebenzisa le rubrikhi hhayi nje ngokungenzi lutho, kodwa ngokuzikhandla – iqondisa ingxoxo kuzimo eziletha ubufakazi bamakhono athile. Isibonelo, uma ubukhulu obuhlosiwe buwukuxazulula ukungqubuzana, i-LLM Ephezulu ingase iyalele oyedwa wabantu bayo be-AI ukuthi ethule ukungaboni ngaso linye futhi akusekele kuze kube yilapho umhlanganyeli ongumuntu ebonisa (noma ehluleka ukubonisa) isu lokuxazulula ukungqubuzana. Lokhu kusebenza kufana nendlela i-computerized adaptive test (CAT) elungisa ngayo ubunzima bento ngokusekelwe ekusebenzeni komhloli osebenzayo – ngaphandle kwalapha, 'izinto' zishintshana engxoxweni ebukhoma.
Okwesibili, isisekelo sama-ejenti azimele (ama-LLM ahlukene angenakho ukusebenzisana) abonakale ebuthakathaka ngokubonakalayo. Ngaphandle kokuqondisa, izingxoxo zingase zingakhiqizi ubufakazi obufanele – uma amalungu eqembu evuma ngokwemvelo, akukho ukungqubuzana okufanele kuxazululwe, futhi ukuhlola akufunda lutho ngalolo lwazi oluncane.
I-Gemini 2.5 Pro isetshenziswe njengemodeli engaphansi kwe-Executive LLM yokuhlola okuyinhloko kokubambisana, kuyilapho i-Gemini 3 inika amandla ubuhlakani namamojula okucabanga okubucayi.
Lokho Okuboniswa Yimihlola Empeleni
Ithimba labacwaningi liqashe ababambiqhaza abangu-188 abaneminyaka engu-18-25, abakhuluma isiNgisi bomdabu abazinze e-United States, kusetshenziswa inkundla ye-Prolific. Umhlanganyeli ngamunye ukhiqize izingxoxo ezimbili zesamba esingu-373 esilotshiwe (ezintathu zihlungiwe ngenxa yezinkinga zobuchwepheshe). Bonke ababambiqhaza basebenze ngemisebenzi yokuhlanganyela – kungaba idizayini yokuhlolwa kwesayensi noma inkulumo-mpikiswano ehlelekile – neqembu labantu be-AI, imizuzu engama-30 ngengxoxo ngayinye.
Amakhono amancane amabili okubambisana ahlolwe: Ukuxazululwa Kwezingxabano (CR) kanye Nokuphathwa Kwephrojekthi (PM). Izingxoxo zilinganiselwe ngabalingani ababili bokufundisa abantu abavela eNyuvesi yaseNew York kanye ne-AI Evaluator (Gemini 3.0), ethole amaphuzu omhlanganyeli ngamunye izikhathi ezingu-20. Ijika lamenyezelwa i-NA uma kukhona okubikezelwayo kwezingama-20 okubuyiswe i-NA; uma kungenjalo, ilebula yokugcina bekuyizinga elivame kakhulu elingelona ele-NA phakathi kwama-run angu-20. Imodeli yokuhlehla – umugqa wokuthola amaphuzu, ukuhleleka kwezinqumo ze-NA – wabe eseqeqeshwa kulawa malebula e-turn-level ukuze kukhiqizwe amaphuzu weleveli yengxoxo, nokusebenza okuhlolwe kusetshenziswa ukuqinisekiswa kwe-level-one-out.
Imiphumela eyinhloko iyaphoqa emikhakheni eminingi. Izinga lobufakazi beleveli yokushintsha kanye nezinga lengxoxo lokuziphatha okuhlobene namakhono abephezulu kakhulu ezimeni ze-Executive LLM kunesimo sama-agent azimele kuwo womabili amakhono amancane. Izinga lolwazi lwezinga lengxoxo lifinyelele ku-92.4% Lokuphathwa Kwephrojekthi kanye nama-85% ekuxazululeni Ingxabano lapho kusetshenziswa i-LLM Elawulayo efaniswe namakhono. Ngokuphawulekayo, ukumane utshele ababambiqhaza ukuthi bagxile ekhonweni akubanga nomphumela ophawulekayo kumazinga obufakazi (konke p> 0.6), okuqinisekisa ukuthi ukuqondisa kufanele kuvele ohlangothini lwe-AI.
Ekutholeni amaphuzu ngokunemba, isivumelwano se-inter-rater phakathi koMhloli we-AI kanye nochwepheshe babantu – esikalwa nge-Cohen's Kappa – sasiqhathaniswa nesivumelwano phakathi kwabantu, esasisukela kokumaphakathi (κ = 0.45–0.64) kuwo womabili amakhono kanye nemisebenzi yomibili yokushaya amagoli.


Ukulingisa njengebhokisi lesihlabathi sokuthuthukisa
Okunye okutholakalayo okuwusizo konjiniyela be-ML abakha amasistimu afanayo ukuqinisekiswa kokulingiswa okusekelwe ku-LLM njengokuma kwezifundo zabantu phakathi nokuthuthukiswa kwephrothokholi. Ithimba locwaningo lasebenzisa i-Gemini ukuze lilingise ababambiqhaza abangabantu kumaleveli ekhono aziwayo (1–4 kuhlangothi lwerubrikhi ngayinye), lase lilinganisa iphutha lokuphinda lithole — umehluko ophelele omaphakathi phakathi kwezinga leqiniso eliphansi kanye nezinga eliqondiwe lombhali. I-Executive LLM ikhiqize iphutha eliphansi kakhulu lokutakula kune-Independent Agents kubo bobabili i-CR kanye ne-PM. Amaphethini wemfanelo kudatha elingisiwe afane eduze nalawo avela ezingxoxweni zangempela zabantu, okuphakamisa ukuthi ukulingisa okusekelwe ku-rubrikhi kungasusa ubungozi bomklamo wokuhlola ngaphambi kokuqoqwa kwedatha yomuntu ebizayo.
Izilinganiso Zobufakazi Zinwebeka Kuwo Wonke Ubuciko Nokucabanga Okubalulekile
Ngobuciko nokucabanga okujulile, amazinga obufakazi bokuqala ahlolwe kusetshenziswa izifundo ezifanisiwe. Imiphumela ibonisa i-Executive LLM esebenza kahle kakhulu kunabasebenzeli Abazimele kuzo zonke izilinganiso eziyi-8 ezihloliwe – zonke izilinganiso zokudala eziyisithupha (Uketshezi, Uqobo, Ikhwalithi, Ukwakha Ngemibono, Ukucacisa, Nokukhetha) kanye nobukhulu bokucabanga okubalulekile (Humusha futhi Hlaziya; Linganisa kanye Nejaji) – ngawo wonke umehluko obalulekile ngokwezibalo. Ithimba locwaningo liphawule ukuthi ukuqoqwa kokulinganisa kwabantu kulawa makhono amabili kuyaqhubeka futhi imiphumela izokwabiwa emsebenzini wesikhathi esizayo, kodwa imiphumela yokulingisa iphakamisa ukuthi indlela ye-Executive LLM ihlanganisa ngaphezu kokubambisana.
I-Creativity Scoring ku-0.88 Pearson Correlation
Ngobambiswano oluhlukile ne-OpenMic, isikhungo esakha amathuluzi okuhlola amakhono aqinile we-AI anamandla e-AI, ithimba labacwaningi lihlole umshini wabo wokuqamba osuselwa ku-Gemini emisebenzini eyinkimbinkimbi ye-multimedia ephothulwe abafundi besikole samabanga aphezulu abangama-280. Imisebenzi ebandakanya ukuklama ingxenye yezindaba esekelwe endabeni emfushane, okuhlanganisa ukukhiqiza imibuzo yenhlolokhono yabalingiswa. Ngokubalulekile, izethulo eziyi-100 zisetshenziswe kuqala ukuze kucwengisiswe ukwaziswa kwe-Gemini kanye namarubrikhi okufundisa ochwepheshe, kuyilapho izethulo ezisele eziyi-180 ezisasele zasetshenziswa ukuze kuhlolwe ukunemba kokugcina. Amaphuzu asekelwe kuRubrikhi ngochwepheshe be-OpenMic kanye nomdayisi bavumelana e-Cohen's Kappa = 0.66 (isivumelwano esihle) kuleveli yento. Okumangalisa nakakhulu, lapho wonke amaphuzu athunyelwe eqhathaniswa, ukuhlobana kwe-Pearson phakathi kwengqikithi yombhali nochwepheshe bomuntu kwaba ngu-0.88 — izinga lesivumelwano okunzima ukusifinyelela ngisho naphakathi kwabalinganisi babantu emisebenzini yokudala eqondile.
Ukuvala iluphu yempendulo
Ngale kokushaya amaphuzu, i-Vantage iveza imiphumela kubasebenzisi ngokusebenzisa imephu yamakhono obuningi ebonisa amazinga okufaneleka kuwo wonke amakhono namakhono amancane, nenketho yokwehlela phansi ibe izingcaphuno ezithile zengxoxo eziqinisekisa amaphuzu ezinombolo. Lokhu kwenza ubufakazi bokuhlola bubonakale futhi busebenze – ukucatshangelwa kwedizayini okunengqondo kunoma ubani owakha amapayipi okuhlola afanayo lapho kubaluleke khona ukutolika kwezikolo ezizenzakalelayo.
Okuthathwayo Okubalulekile
- I-'Executive LLM' eyodwa idlula ama-ejenti amaningi azimele ekuhlolweni kwamakhono: Kunokuba isebenzise i-LLM eyodwa ngozakwethu weqembu le-AI, i-Vantage ye-Google isebenzisa i-LLM eyodwa yokuxhuma ekhiqiza izimpendulo zabo bonke ababambiqhaza be-AI. Lokhu kuyivumela ukuthi iqondise izingxoxo ngenkuthalo isebenzisa irubrikhi yokufundisa – ukwethula ukungqubuzana, ukuhlehla emibonweni, noma ukudala izithiyo zokuhlela – ukukhipha ubufakazi obubonakalayo bamakhono athile angeke avele ngokwemvelo.
- Amaphuzu asekelwe ku-LLM manje aselingana nezilinganiso zochwepheshe abangabantu: Isivumelwano soMhloli we-AI nabalinganisi babantu besiqhathaniswa nesivumelwano phakathi kochwepheshe ababili abangabantu ngokwabo, abafinyelele kuphela i-Cohen's Kappa emaphakathi (0.45–0.64) ngisho nangemva kwemizuliswano eminingi yokulinganisa. Lokhu kubeka amaphuzu e-LLM azenzakalelayo njengendlela ehlukile engalinganiswayo kunesichasiselo esibizayo sabantu semisebenzi yengxoxo eyinkimbinkimbi, evulekile.
- Ukutshela abasebenzisi ukuthi bagxile ekhonweni akwenzi lutho – isiteringi kufanele sisuke ohlangothini lwe-AI: Abahlanganyeli abayalwe ngokucacile ukuba banake ukuxazululwa kwezingxabano noma ukuphathwa kwephrojekthi ababonisanga ukuthuthukiswa okuphawulekayo kwezibalo zobufakazi (konke p > 0.6) uma kuqhathaniswa nalabo abanganikezwanga imiyalelo. Ukuqondisa okusebenzayo kwe-Executive LLM kuphela okukhiqize idatha yokuhlola enothe ngokulinganiswa.
- Ukulingiswa kwe-LLM kungasebenza njengebhokisi lesihlabathi elibiza kancane ngaphambi kokuqhuba izifundo nabantu bangempela: Ngokulingisa ababambiqhaza emazingeni aziwayo wamakhono kanye nokulinganisa ukuthi isistimu iwabuyise ngokunembe kangakanani lawo mazinga, ithimba locwaningo liqinisekise umthetho olandelwayo wokuhlola ngaphandle kokushisa ngezabelomali zezifundo zabantu ezibizayo. Amaphethini ezingxoxo ezilingisayo nawangempela ayefana ngekhwalithi, okwenza lokhu kube indlela engokoqobo yokuphindaphinda amarubriki kanye nokwaziswa kusenesikhathi ekuthuthukisweni.
- Amaphuzu obuhlakani be-AI azuze ukuhlobana kuka-0.88 Pearson nochwepheshe abangabantu emsebenzini wangempela wabafundi: Esivivinyweni somhlaba wangempela okuthunyelwe kwabafundi besikole samabanga aphezulu okungu-180, umthwebuli wezincwadi osekelwe ku-Gemini ufane nezikolo zochwepheshe bomuntu ekuhlanganisweni kwe-Pearson kuka-0.88 ekuhloleni kokusungula okuphelele – okubonisa ukuthi amaphuzu azenzakalelayo emisebenzi eyinkimbinkimbi, ecabangelayo, ye-multimedia akwenzeki nje ngokomcabango kodwa kuqinisekiswa ngokomlando.
Hlola Iphepha futhi Imininingwane yobuchwepheshe. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 130k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.
Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhumana nathi

U-Michal Sutter uchwepheshe wesayensi yedatha one-Master of Science in Data Science yase-University of Padova. Ngesisekelo esiqinile ekuhlaziyeni izibalo, ukufunda ngomshini, nobunjiniyela bedatha, u-Michal uphuma phambili ekuguquleni amasethi edatha ayinkimbinkimbi abe imininingwane ephathekayo.



