I-EmoNet: Ama-Speaker-Aware Transformers for Emotion Recognition – kanye nalokho Engizokwakha Ngokuhlukile ngo-2026

ngithumele ithisisi yami ye-MS ethi Ukuqashelwa Kwemizwa Engxoxweni (ERC). Imodeli, I-EmoNetuthole Isisindo esingu-F1 se 39.18 ku-EmoryNLP — ngincintisana nebhodi labaphambili lomphakathi elithi PapersWithCode ngaleso sikhathi, lihlezi phakathi kwe-TUCORE-GCN_RoBERTa (39.24) kanye ne-S+PAGE (39.14), futhi ngithuthuka ngaphezu kwesisekelo sami esikhethiwe, i-CoMPM, ngo +1.81 F1.
Ngemva kweminyaka emibili, ngabuyela ngiyobheka ukuthi insimu isikuphi manje. Ibhodi yabaphambili ayibonakali. Okufakiwe okuphezulu akusewona amamodeli e-encoder kuphela anamakhanda okunaka ahlakaniphile — anjalo Amasistimu asekelwe ku-LLaMA-2–7B ane-LoRA yokucushwa kahle kanye nokwazisa okuthuthukisiwe kokubuyisa: I-InstructERC, CKERC, BiosERC, LaERC-S. Izindlela zihlukile. Ikhompyutha ihlukile. Umqondo wehlukile.
Nokho – lapho ngifunda la maphepha amasha ngokucophelela, imibono ewumongo engiyihlongoze ku-EmoNet ibonakala ngaphakathi kuyo, isanda kusetshenziswa ungqimba oluhlukile lwesitaki. Lena yindaba engiyakhile, lapho ibekwe khona, nokuthi bengizokwakha ngani uma bengiqala phansi.
Iyini i-ERC, futhi kungani ukubhala kuphela kunzima
Ukubonwa Komzwelo Engxoxweni kuwumsebenzi wokunikeza ilebula yomzwelo enkulumweni ngayinye engxoxweni. Ihlukile ekuhlaziyweni kwemizwa emishweni engayodwa ngendlela eyodwa ebalulekile: umzwelo wenkulumo ulolongwa yilokho okwafika ngaphambi kwayo, nokuthi ubani okhulumayo.
Cabangela lokhu kushintshana okuvela kudathasethi ye-EmoryNLP (etholakala kuhlelo lwe-TV Abangane):
UMonica: Wendy, sibe nesivumelwano! Yebo, uthembisile! Wendy! Wendy! Wendy! [Mad]
U-Rachel: Kwakungubani lowo? [Neutral]
UMonica: Wakhipha ibheyili uWendy. Anginaweta. [Mad]
Ngokuzihlukanisa, “Kwakungubani lowo?” ayithathi hlangothi ngokomzwelo. Ilebula Ukungathathi hlangothi kunengqondo kuphela kumongo — ihlala phakathi kwamazwi amabili athukuthele avela kusipika esihlukile futhi amamodeli e-ERC kufanele athwebule le ngxoxo eguquguqukayo.
Kukhona ukushwabana kwesibili: ulwazi lwe-multimodal alukho. Engxoxweni yomuntu yangempela, iphimbo, isimo sobuso, nokushukuma komzimba kunengxenye enkulu yophawu lomzwelo. I-ERC yombhalo kuphela isusa konke lokho. Amagama afanayo – “O, kuhle.” — kungaba qotho noma ukubhuqa, futhi umbhalo wodwa ngokuvamile awukwazi ukukutshela ukuthi yimuphi.
Lokhu kulahlekelwa kolwazi kuyinselele enkulu. Kufanele ukhiphe imizwa kusiginali enomsindo kunebhentshimakhi yezinga lomuntu.
I-landscape ka-2024
Ngenkathi ngiqala i-thesis yami ngasekupheleni kuka-2023, ibhodi yabaphambili ye-EmoryNLP yayibuswa yizakhiwo ezisekelwe ku-transformer ezinezinguquko ezihlukahlukene ezihlakaniphile. Ukuvakasha okusheshayo:
– I-KET (Zhong et al., 2019) – Isiguquli esithuthukiswe ulwazi esinakwa kwegrafu ethintekayo, iphepha lokuqala lokuletha ama-transformers ku-ERC.
– IngxoxoGCN (Ghosal et al., 2019) – inethiwekhi yokuguqula igrafu eguqule izingxoxo zaba izinkinga zokuhlukaniswa kwama-node.
– I-RGAT (Ishiwatari et al., 2020) – ukunakwa kwegrafu eqaphela ubuhlobo nombhalo wekhodi wesimo sokuhlobana sokuncika kwesipika.
– I-DialogXL (Shen et al., 2020) – iguqulelwe i-XLNet ngokuphindaphinda kwezwi kanye nokuzinaka kwengxoxo.
– I-HiTrans (Li et al., 2020) – Isiguquli esilandelanayo esinokuqinisekiswa kokubili kwesipikha sokuphimisa njengomsebenzi osizayo.
– I-TUCORE-GCN (Lee & Choi, 2021) — igrafu yengxoxo ehlukahlukene ene-BERT eyazi isipikha.
– I-ComMPM (Lee & Lee, 2021) – umongo wengxoxo ehlanganisiwe nokulandelela inkumbulo okuqeqeshwe kusengaphambili kwesipikha.
Ngikhethile I-ComMPM njengesisekelo sami ngezizathu ezimbili. Okokuqala, ifanekisele ngokusobala inkumbulo yesipika eqeqeshwe kusengaphambili njengemojula ehlukile – efanekisela umbono wami wokuthi I-WHO ukhuluma izinto njengoba nje ini besho. Okwesibili, ukwakheka kwayo bekuyi-modular ngokwanele ukunwebeka ngaphandle kokubhala kabusha kusukela ekuqaleni. Iphepha le-CoMPM libonise ukuthi ukwengeza inkumbulo eqeqeshwe kusengaphambili kumodeli yomongo kunikeza umfutho olinganisekayo – kodwa ubunikazi baso besikhulumi babusenjalo. okwasendaweni kungxoxo ngayinye. Lapho kuqala ingxoxo entsha, konke umfuziselo ayekufunde ngesipika kwalahlwa.
Lokho kwabonakala kuyinkinga okufanele ixazululwe.
Iminikelo emithathu, nge-intuition
1. I-Global Somlomo Identity
Inkinga. Ku-CoMPM nasemsebenzini wangaphambilini, ama-ID wesipikha akhokhwa enkulumweni eyodwa. Isipikha u-A esigcawini 1 asinabo ubudlelwano Isipikha u-A esigcawini 14, ngisho noma bengumuntu ofanayo. Ngakho, yonke inkhulumomphendvulwano iqala kubanda.
Intuition. Abantu banamaphethini emizwa ayisici. UMonica uthukuthela ngezinto ezithile; UFebe ujabule ngokwethembekile; U-Ross unezikhathi zokungazethembi ezingabikezelwa. Uma imodeli ingaphatha ulwazi mayelana lesi sikhulumi esiqondile kuzo zonke izingxoxo, kufanele ikwazi ukwenza izibikezelo ezilinganiswe kangcono uma leso sipikha sivela futhi.
Ukuqaliswa. Isipikha ngasinye esiyingqayizivele kuyo yonke idathasethi sithola i-ID ezinzile, yobubanzi bedathasethi. Isikhathi sokuqala UMonica Geller uyavela, unikezwe umazisi – yithi, i-ID 7 – ehlala kuye. Konke ukubonakala okulandelayo – kuzo zonke iziqephu, izinkathi zonyaka, izigcawu – uhlala e-ID 7. Imodeli manje ingafunda amaphethini aqondene nesipikha aqhubekayo.
Lokhu kuzwakala kusobala uma ubheka emuva. Ngo-2024 kwakungeyona indlela amamodeli ebhodi labaphambili asebenza ngayo.
2. Imojula Yokuziphatha Isikhulumi
Inkinga. I-Global Speaker Identity iyodwa ilebula nje. Ukwenza kube usizo, imodeli idinga enza into nomlando wesikhulumi oqoqwe. Uyinika kanjani i-transformer ukufinyelela “kuyo yonke into uMonica ake ayisho kule dathasethi,” ngaphandle kokucisha iwindi lomongo noma ukwenza ukuqeqeshwa kungenzeki?
Intuition. Ukuphinda. I-GRU ilingana ngokwemvelo ukuze ucindezele ngokulandelana izinkulumo zomlando zesipika zibe yisifanekiso esisodwa sikasayizi ongashintshi. Izinkulumo zakamuva zifaka isandla kakhulu; asebekhulile kancane kancane bancipha. A okulungisekayo iwindi elishelelayo ibopha okokufaka kwe-GRU – yithi, amazwi okugcina angu-N ngalesi sipikha – ukugcina ikhompuyutha nenkumbulo kubikezelwa.
Ukuqaliswa. Inkulumo ngayinye ibhalwa ngokuzimele ngomgogodla we-RoBERTa oqeqeshwe kusengaphambili. Ukushumeka okuwumphumela kugeleza ku-GRU engaqondile. Isimo esifihliwe sokugcina se-GRU — sibize `kt` — simele iphethini yokuziphatha yesikhulumi okwamanje. Lokhu kuvezwa kubukhulu obufanayo nokuphuma komongo wengxoxo bese kwengezwa. Isiginali ehlanganisiwe iphakela isihlukanisi sokugcina.
Isakhiwo sifana ngokwesakhiwo nemojula yenkumbulo eqeqeshwe ngaphambilini ye-CoMPM, kodwa inomehluko obalulekile emibili: iphuli yomlando wesipikha global (hhayi okwasendaweni kwingxoxo yamanje), futhi i-GRU ikhombisa ngokusobala ukubola kwesikhashana.
3. Ukulahlekelwa Kwe-Cross-Entropy Okunesisindo
Inkinga. I-EmoryNLP ayilingani – Ukungathathi hlangothi inqwaba Kuyadabukisa cishe 4.5:1. Amaphepha amaningi aphatha lokhu ngokungeza idatha noma ukusampula okuncane. Kodwa idatha yengxoxo ngokulandelana: ukulahla noma ukuphindaphinda amazwi kuhlanekezela ukugeleza kwemizwa kwemvelo, okuyisignali imodeli ezama ukufunda kuyo.
Intuition. Uma ungakwazi ukushintsha idatha ngokuphephile, shintsha ukulahlekelwa. Isisindo esingavamile amakilasi aphezulu ngakho ukuhlukaniswa okukodwa okungalungile kwe Kuyadabukisa kubiza imodeli ngaphezu kokungahlukaniswa kahle okukodwa kwe Ukungathathi hlangothi.
Ukuqaliswa. I-Cross-entropy enezisindo zekilasi ngalinye ezithathwe kumvamisa yekilasi eliphambene, bese zijwayezwa. Ayikho into engavamile – kodwa ngempikiswano yokulandelana kwengxoxo njengesisusa esicacile, lokhu kuba ukukhetha okunesimiso kunokuba kube okungenangqondo.
Imiphumela: yini esebenzile, futhi yini eyangimangaza
Nali ithebula lokukhipha imali elivela kuthisisi:

Umphumela ongimangazayo – nokuthi ngicabanga ukuthi ingxenye ethembeke kakhulu yalo msebenzi – umugqa wesibili. Ukwengeza i-ID yesikhulumi se-Global kukodwa kwenza imodeli ibe yimbi kakhulu (I-F1 yehle isuka ku-37.85 yaya ku-29.43). Lokho kwakubukeka njengesehluleki ekuqaleni.
Kodwa kwakungenjalo. I-Global Speaker Identity yi-a amandla – inikeza imodeli ikhono lokufunda amaphethini esipikha sebanga elide. Ngokwalo, lelo khono lidala umthwalo wokumelela enye imodeli engakwazi ukumunca. Kanye kuphela Imojuli yokuziphatha yesipika kwengezwe – ukunikeza imodeli indlela ehlelekile sebenzisa ubunikazi bomhlaba wonke – igalelo libonakale. Ngokucushwa kokugcina, i-EmoNet yayiluleme futhi yeqa isisekelo se-CoMPM ngo-1.81 F1.
Lesi yisifundo engasithatha ekukhishweni: isici asibalulekile uma sisodwa; ibalulekile uma ihlanganiswa nemishini ewudlayo. Amaphepha ocwaningo abika ukuthi “lokhu kungezwa kusinike u-+X%” ngokuvamile kufihla imigqa yokukhishwa lapho ukungezwa kukodwa kwenze izinto zaba zimbi nakakhulu. Ngakhetha ukugcina lowo mugqa phakathi.

Imodeli egcwele isibanjiwe Ukungathathi hlangothi, Injabulofuthi Ethukile kahle. Inamandla lahlala liyikilasi elinzima kakhulu – ngokwengxenye ngoba liyivelakancane, futhi ngokwengxenye ngoba Inamandla futhi Injabulo cishe azihlukaniseki engxoxweni yombhalo ngaphandle kwezimpawu ze-acoustic. Lena inkinga ye-multimodal ezenza inkinga yombhalo.
Ukuzindla (2026): insimu yanyakaza, nathi kufanele senze kanjalo
Eminyakeni emibili kuqhubeke, ibhodi yabaphambili ye-EmoryNLP ibukeka yehluke ngokuphelele. Amasistimu ahamba phambili manje yilawa:
– UmfundisiERC (Lei et al., 2023) – ihlela kabusha i-ERC njengomsebenzi we-LLM okhiqizayo. Isebenzisa izifanekiso zemiyalelo yokubuyisa-ukungathandwa kwabathelisi esikubona kanye nemisebenzi eyisekelayo njengokuhlonza isipikha nokubikezela imizwa ukuze kube imodeli engcono yezindima zengxoxo kanye nokuguquguquka kwemizwelo.
– I-CKERC (Fu, 2024) – yethula i-ERC ethuthukisiwe ye-commonsense. Ephimiseleni ngalinye, i-LLM ikhiqiza izichasiselo ezinengqondo mayelana nenhloso yesikhulumi kanye nokusabela komlaleli okungenzeka, inikeze ukucabanga okusobala kwezenhlalo nemizwa ngale ngqikithi yengxoxo esobala.
– I-BiosERC (Xue et al., 2024) — ifaka ulwazi lwe-biography olususelwa ku-LLM enqubweni ye-ERC, okuvumela imodeli ukuthi ingacabangi nje phezu komongo wenkulumo kodwa nangaphezulu kwezici eziqondene nesikhulumi.
– I-LaERC-S (Fu et al., 2025) – ukuhlelwa kwezinyathelo ezimbili. Isigaba 1: hlomisa i-LLM nge izici eziqondene nesipikha. Isiteji sesi-2: sebenzisa lezo zici phakathi nomsebenzi we-ERC ngokwawo.
Bheka lezi ezimbili zokugcina ngokucophelela.
Ulwazi lwempilo yesikhulumi se-BiosERC emoyeni, i-Global So speaker Identity inyusiwe — esikhundleni se-ID ephelele, iphrofayili yombhalo i-LLM engakwazi ukuyinakekela. Izici zesipikha se-LaERC-S ngomoya, imojula Yokuziphatha yesipika – amaphethini omlando wesipikha enziwe atholakala kumodeli – kodwa agoqwe abe ukuhlelwa kweziyalezo kunokuba asetshenziswe njenge-GRU ehlukile.
Imibono yezokwakha yaphakama. Isendlalelo sokusebenzisa sishintshile.
Le ngxenye engiyithola ithakasela ngempela. Ngenkathi ngisebenza ku-EmoNet ngo-2024, ngangicabanga ngaphakathi kwepharadigm ye-encoder-only-transformer: “ngiyengeza kanjani enye imojuli ekwakhiweni kwezakhiwo?” Amaphepha ka-2024-2025 acabanga ngaphakathi kwepharadigm ye-LLM: “Ngiwufaka kanjani lo mbono ekuhlelweni kwemiyalo noma umongo wokubuyisa?” Imibono iyafana; amaphuzu okuzuza ahlukile.
Ukube bengizokwakha kabusha i-EmoNet namuhlabengingeke ngiqale ku-RoBERTa-large. Ngizoqala ku-LLM encane yomthombo ovulekile — LLaMA-3.2–3B, Qwen-2.5–3B, noma Phi-3.5 — futhi ngisebenzise LoRA ukuyishuna kahle ku-EmoryNLP, ngokulandela umndeni we-InstructERC wezindlela. I-Global Speaker Identity iba umlando wesipikha sombhalo otholwe esitolo se-vector. Imojuli Yokuziphatha yesipika iba ukwaziswa okumbalwa ngomlando wakamuva wemizwa wesipika. Ukulahlekelwa Okunesisindo kusinda cishe kungashintshiwe – ukungalingani kwekilasi akunandaba ukuthi usebenzisa yiphi imodeli.
Umdwebo wezakhiwo uzobukeka uhluke ngokuphelele. Isikweletu somqondo sethisisi yango-2024 sizobonakala uma wazi ukuthi ubheke kuphi.
Kungifundise ukuthi isikweletu socwaningo sinesigamu sempilo ende kunalokho ebengikulindele – imibono iyasinda ekushintsheni kwe-paradigm noma ngabe ukusetshenziswa kwayo kungasebenzi.
Lapho lokhu kungishiya khona
I-EmoNet manje isifakwe kungobo yomlando esidlangalaleni ngaphansi kwe-DOI 10.5281/zenodo.20048006 ngethesis egcwele, amaslayidi okuvikela, nokusebenzisa i-PyTorch ku-GitHub. Njengamanje ngisebenzela ichweba lesimanjemanje – i-LLM ecushwe kahle ye-LoRA enomongo wesipika esisekelwe ekubuyiseni – njengephrojekthi yokulandelela engizobhala ngayo maduze.
Uma usebenza ngengxoxo ye-AI, usebenzisa i-NLP, noma ukulungisa kahle kwe-LLM, ngingaba nentshisekelo yokuzwa ukuthi wakhelani.



