Reactive Machines

Ukuthi uRufus waphinda kanjani isivinini sabo sokuthathelana futhi wasingatha traffic prime traffic nge-AWS AI chips kanye nokuhlobisa okufanayo

Amamodeli amakhulu olimi (LLMS) aguqule indlela esisebenzisana ngayo nobuchwepheshe, kepha ukwamukelwa kwabo okuphelele kuvinjelwe yi-latency ephezulu, okulinganiselwe okukhawulelwe, kanye nezindleko eziphakeme ezihambisana nesizukulwane sombhalo. Lokhu kungasebenzi kahle kukhulunywa ngokukhethekile ngesikhathi semicimbi efunwa kakhulu njengosuku lokuqala lwe-Amazon, lapho amasistimu afana ne-rufus – umsizi wezitolo we-Amazon Ai URufus ungumsizi wokuthenga onamandla we-AI yakhelwe ukusiza amakhasimende izinqumo zokuthenga ezinolwazi. Inikwe amandla yi-LLMS, uRufus uphendula imibuzo yamakhasimende mayelana nezidingo ezahlukahlukene zokuthenga nemikhiqizo futhi yenza lula isipiliyoni sokuthenga, njengoba kukhonjisiwe esithombeni esilandelayo.

URufus uncike ezingxenyeni eziningi ukuletha isipiliyoni sazo samakhasimende kufaka phakathi isisekelo se-LLM (ukuphendula izifundo) kanye nemodeli yokuhlela umhleli (QP) yokuhlukaniswa kombuzo kanye nokuthuthukiswa kwe-retrieval. Imodeli iyanxusa imibuzo yamakhasimende ukuze iqonde inhloso yayo, noma ngabe ulimi lwemvelo olususelwa egama olususelwe noma oluguqukayo. I-QP isendleleni ebucayi ye-rufus ngoba i-rufus ayikwazi ukuqala isizukulwane esiphakeme kuze kube yilapho i-QP inikeza umphumela wayo ogcwele. Ngakho-ke, ukunciphisa ukukhiqizwa kwe-latency ye-QP-to-end-to-end kuyisidingo esibucayi sokwehlisa i-chunk latency yokuqala eRufus, okusho isikhathi esithathiwe sokukhiqiza nokuthumela impendulo yokuqala kwisicelo somsebenzisi. Ukwehlisa le satency kuthuthukisa ukuphendula okucatshangelwe kanye nesipiliyoni somsebenzisi jikelele. Lokhu okuthunyelwe kugxile ekutheni imodeli ye-QP esebenzisa kanjani ukuguqulwa okusalungiswa kwe-centric decoding (SD) -Also okubizwa nge-parallel decoding-ngama-aw ai chips ukufeza izimfuno zosuku lokuqala. Ngokuhlanganisa ukuhlanganisa okufana nama-AWS Traininaium kanye ne-Inferentia Chips, uRufus uzuze izikhathi ezimbili zokuphendula ngokushesha, ukwehliswa kwezindleko ezingama-50%, kanye nokwehla komthungo ngesikhathi sokuhamba ngezinyawo.

Ukukala llms ngosuku lwe-Prime

Usuku lokuqala lungenye yemicimbi efunwa kakhulu yengqalasizinda ye-Amazon, amasistimu aqhakambisa imingcele yawo. Ngo-2024, uRufus wabhekana nenselelo yobunjiniyela engakaze ubonwe: Ukuphatha izigidi zemibuzo ngomzuzu futhi wakha izigidigidi zamathokheni ngesikhathi sangempela, konke ngenkathi kugcinwa i-latency sla yama-300 ye-qp kanye nokunciphisa ukusetshenziswa kwamandla. Lezi zimfuno zidinga ukunqunyelwa kabusha kwendlela okuyisisekelo yokuthi ama-llms athunyelwa kanjani esikalini esinqoba izindleko namabhodlela okusebenza. Izinselelo ezibalulekile zosuku lwe-Prime zifakiwe:

  • Isilinganiso esikhulu: Ukusebenzela izigidi zamathokheni ngomzuzu ngamunye kumakhasimende emhlabeni jikelele, ngokugadla okuphezulu kwethrafikhi edonsa ngisho nezinhlelo eziqine kakhulu.
  • Ama-slas aqinile: Ukuletha ukuphendula kwangempela okunomkhawulo we-latency onzima wama-300 ms, kuqinisekisa isipiliyoni samakhasimende esingenamthungo.
  • Ukusebenza kwezindleko: Ukunciphisa izindleko zokusebenzela i-LLMS ngesilinganiso ngenkathi kunciphisa ukusetshenziswa kwamandla, isici esibucayi sokusebenza okuzinzile futhi okunomnotho.

Isizukulwane sombhalo wendabuko se-LLM asisebenzi ngokwemvelo ngenxa yemvelo yayo elandelanayo. Isizukulwane ngasinye saseTocken sidinga ukudlula okugcwele ngemodeli, okuholela ekutheni i-latency ephezulu kanye nokungaphansi kwemithombo ye-computational. Ngenkathi amasu afana nokuqagela okucatshangwayo ahlongozwayo ukubhekana nalokhu kungasebenzi kahle, ubulukhuni bawo kanye nokuqeqeshwa phezulu kuncishiswe ukutholwa kwawo.

Ama-AWS AI chips kanye nokuhlobisa okufanayo

Ukunqoba lezi zinselelo, uRufus wamukela ukudonswa okufanayo, inqubo elula kodwa enamandla yokusheshisa i-LLM Generation. Ngokuhlolisisa okufanayo, ukuncika okulandelanayo kwephukile, kwenza isizukulwane esisheshayo. Le ndlela yethula amanye amakhanda wokuhlobisa emodeli eyisisekelo eqeda isidingo semodeli ehlukile ehleliwe yamathokheni aqanjiwe. Lawa makhanda abikezela amathokheni amaningi ngokufana kwezikhundla ezizayo ngaphambi kokuba azi amathokheni adlule, futhi lokhu kuthuthukisa kakhulu ukusebenza kahle kwesizukulwane.

Ukusheshisa ukusebenza kokuhlobisa okuhambisanayo kokutholwa kwe-inthanethi, i-rufus isebenzise inhlanganisela yezixazululo ze-AWS: I-Inferentium AI chips, ifu le-Amazon Elastic Compute (i-Amazon EC2) kanye nesicelo somthwalo we-Balancer. Ngaphezu kwalokho, iqembu le-rufus elibambisane ne-nvidia lisebenzisa isisombululo usebenzisa iseva ye-triton ye-triton ye-nvidia, ukuhlinzeka amakhono okusingatha imodeli usebenzisa ama-chip we-AWS.

Ukuthola ukusebenza kahle okukhulu kokuhlobisa okufanayo kuma-AWS Neuron Cores, sasebenza ngokubambisana nethimba le-AWS Neuron ukwengeza ukusekelwa kwezokwakha (i-NXDI) Uhlaka lwe-NEuronx-abhujiwe yokuhlobisa ngosayizi owodwa we-batch.

URufus wanweba isisekelo se-LLM ngamakhanda amaningi wokuhlobisa. Lawa makhanda ayisendlalelo esincane senethiwekhi ye-neural futhi aqeqeshelwe kusetshenziswa izethulo ezifundwayo zemodeli eyisisekelo ukubikezela amathokheni alandelayo ngokufana. Lawa makhanda aqeqeshelwe kanye nemodeli yasekuqaleni, ukugcina imodeli eyisisekelo engashintshiwe.Ngase amathokheni awakhiqizwa ngokulandelana, kufanele aqinisekiswe ukuqiniseka ukuthi wonke amathokheni ahambisana ndawonye. Ukuze uqinisekise amathokheni abikezelwe amakhanda asalungiswa, uRufus usebenzisa indlela yokunakwa kokususelwa esihlahleni ukuze aqinisekise futhi ahlanganise amathokheni. Ikhanda ngalinye lokusalungiswa likhiqiza izinketho eziningana zesikhundla ngasinye. Lezi zinketho zibe sezihlelwe zibe yisakhiwo esinjengesihlahla ukukhetha inhlanganisela ethembisa kakhulu. Lokhu kuvumela amathokheni amaningi okhetho azocutshungulwa ngokufana, anciphise i-latency kanye nokwanda kwe-neuron core core. Lesi sibalo esilandelayo sibonisa isihlahla esisesikhathini esakhiwe sisebenzisa ukulinganiswa kwethu kwesethi, ngokujula kwezine, okukhombisa ukubandakanyeka kwamakhanda amane enqubweni yokubala. I-node ngayinye imele ithokheni kusuka ekubikezelweni okuphezulu kwekhanda okusalungiswa, futhi imiphetho ikhombisa ukuxhumana phakathi kwalezi zindawo.

Umdwebo wesihlahla se-hierarchical esinezihlahla ezine ezinamagatsha amane aphambili, ngalinye liqukethe izindawo ezinamanani amaningi ahlelwe kumazinga wehla

Imiphumela yokusebenzisa ukuqunjelwa okufanayo

Ngokuhlanganisa ukuqunjelwa okufanayo nge-AWS AI CHIPS kanye nohlaka lwe-nxdi, siphindeke kabili ngejubane lesizukulwane sombhalo uma siqhathaniswa nokuhlolisisa okuzenzakalelayo, okwenza kube yisixazululo esifanelekile sendawo efunwa kakhulu yosuku lwe-Prime. Ngesikhathi se-Amazon Prime Day 2024, uRufus wakhombisa amandla ama-AWS AI chips ngama-metric asebenzayo:

  • Isizukulwane esisheshayo esisheshayo: Ama-AWS AI chips, elungiselelwe ukusebenza kokuhlobisa okuhambisanayo, enikwe amandla kabili iphindwe kabili ngejubane lesizukulwane sethokheni ngokuqhathaniswa namaprosesa endabuko. Lokhu kusebenza okuhambisanayo kokusebenza kwavumela amathokheni amaningi wesikhathi esizayo ukuthi abikezelwe ngasikhathi sinye, ukuletha ukusebenzisana kwesikhathi sangempela kwezigidi zamakhasimende.
  • Ama-50% izindleko zokulinganisa eziphansi: Inhlanganisela yezinhloso-eyakhelwe ama-AWS AI chips kanye nokwenza kahle ukuguqula izinqumo kuqedwe ukuhlanganiswa okungafuneki, ukusika izindleko zokuthambisa ngesigamu ngenkathi kugcinwa ikhwalithi yokuphendula.
  • Ukuthunyelwa okwenziwe lula: AWS AI chips ngempumelelo inikwe amandla amakhanda okuhlobisa amamodeli okufana, okunika amandla ukubikezela okukodwa kwethokheni ngaphandle kobunzima bokuphatha amamodeli asalungiswa ahlukile. Lo msebenzi wokwakha wenze lula ukuthunyelwa ngenkathi uletha ukuvumelanisa okusebenzayo esikalini.
  • Ukuqina okungenamthungo: Inhlanganisela ephatheke kahle iTraffic ngaphandle kokuyekethisa ukusebenza kanye nekhwalithi yokuphendula.

Le nqubekela phambili ayizange ithuthukise isipiliyoni samakhasimende kuphela kodwa futhi ikhombisa amandla kahlaka lwe-NXDI kanye nokuvumelana nezimo ze-AWS AI chips zokwenza kahle ukusebenza kwe-LLM enkulu.

Ungayisebenzisa kanjani ukuqunjelwa okufana ne-trainicium kanye ne-inferentia

Ukuguquguquka kwe-NXDI kuhlanganiswe nama-AWS Neuron Chips kwenza kube yisixazululo esinamandla ngesizukulwane se-LLM sombhalo ekukhiqizeni. Noma ngabe usebenzisa i-trainicium noma i-inferentia ngokutholwa, i-NXDI ihlinzeka ngesibonisi esihlanganisiwe sokusebenzisa amandla okukhohlisa okufana. Lokhu kuhlanganiswa kunciphisa ubunzima bokusebenza futhi kunikeza indlela eqondile yezinhlangano efuna ukufaka futhi zilinganise izinhlelo zazo ze-LLM kahle.

Ungahlola inqubo yokuhlolisisa okufana ne-medusa ukusheshisa ukuhamba komsebenzi wakho wokuvuselelwa ku-Inf2 noma ezimeni ze-TRN1. Ukuze uqalise, uzodinga imodeli ehambisanayo yeMedusa (njengokuphambana kwesizukulwane sombhalo / Umugqa-7B-efundisayo-v0.2-medusa) kanye nokucushwa kwesihlahla kwesihlahla seMedusa. Nika amandla i-Medusa ngokusetha is_medusa=TrueUkulungiselela okwakho medusa_speculation_length, num_medusa_headsfuthi echaza okwakho medusa_tree. Lapho usebenzisa i-huggingface akhiqize () API, setha assistant_model kwimodeli yakho kwelitshe. Qaphela ukuthi iMedusa njengamanje isekela usayizi we-batch kuphela we-1.

 def load_json_file(json_path):
    with open(json_path, "r") as f:
        return json.load(f)
medusa_tree = load_json_file("medusa_mc_sim_7b_63.json")
neuron_config = NeuronConfig(
    is_medusa=True,
    medusa_speculation_length=64,
    num_medusa_heads=4,
    medusa_tree=medusa_tree
)

Ukugcina

U-Prime Day uwubufakazi obukhona kumandla okusungula ukunqoba izinselelo zezobuchwepheshe. Ngokusebenzisa ama-AWS AI chips, uRufus akagcini nje ngokuthola izimfuno eziqinile zosuku lokuqala kodwa futhi usethe indinganiso entsha yokusebenza kahle kwe-LLM. Njengoba i-LLMS iqhubeka nokuvela, izinhlaka ezinjenge-NXDI zizodlala indima ebalulekile ekubeni zitholakale kalula, zibekezele futhi zingabizi. Siyajabula ukubona ukuthi umphakathi uzokwakhela kanjani kwi-NXDI Foundation kanye nama-AWS AI chips ukuvula amathuba amasha ezicelo ze-LLM. Yizame namuhla futhi uhlangabezana nomehluko wena!

Ukwamukela

Sidlulisa ukubonga kwethu eqenjini le-AW ANCurna elibhekele ama-AWS AI chips kanye nentuthuko yohlaka. Sibonga ngokukhethekile kubaphenyi kanye nonjiniyela abanikela ngeminikelo yenze lokhu kufinyelelwa. Ukuthuthuka kokungemuva kwe-latency, ukudlula, kanye nokusebenza kwezindleko okutholakele ngokuhlobisa okuhambisanayo uma kuqhathaniswa ne-Autoregrouper Decoding set i-benchchmark entsha yokuhanjiswa kwe-LLM esikalini.


Mayelana nababhali

Shruti dubey Ingabe unjiniyela wesoftware eqenjini le-Core search lika-Amazon, lapho enza khona amasistimu wokuphakanyiswa kwe-LLM ukwenza i-AI ngokushesha futhi ngokwengeziwe. Unothando ngentshiseko i-AI ekhiqizayo futhi uthanda ukuguqula ucwaningo lokusika kuba nomthelela wangempela womhlaba. Ngaphandle komsebenzi, uzothola ukugijima kwakhe, ukufunda, noma ukuzama ukukholisa inja yakhe ukuthi ngumqashi.

Shivangi agarwal Ingabe usosayensi osetshenzisiwe eqenjini levidiyo le-Amazon's Prime Video, lapho agxile khona ekwenzeni amandla ama-LLM abonayo futhi athuthukise amasistimu aphezulu ahlakaniphile amavidiyo aphambili asebenzisa ama-Prime Amavidiyo. Uqhutshwa yintshisekelo yokwakha i-AI esebenza kahle, ekhubazekile enikeza umthelela wangempela womhlaba. Lapho engasebenzi, kungenzeka ukuthi uzomthola ethola i-movie enhle, ethola izindawo ezintsha, noma aqhubeke nengane yakhe eneminyaka emi-3 ubudala.

USukhdeep Singh Kharbanda ngumphathi wesayensi osetshenzisiwe e-Amazon Core Search. Endimeni yakhe yamanje, uSukhdeep uhola ithimba le-Amazon Ukuqothuka ukuze wakhe izixazululo zokusebenzisana kanye nohlelo lokulinganisa ezilinganisweni zokutholwa okusheshayo ngezindleko eziphansi. Umsebenzi wangaphandle, uyakujabulela ukudlala nengane yakhe futhi apheke ama-cuisines ahlukene.

Rahul Goutam Imenenja yesayensi esetshenzisiwe e-Amazon Core Search, lapho ahola khona amaqembu ososayensi nonjiniyela ukwakha izixazululo ze-AI ezihlelekile ezinokuhlangenwe nakho okuguquguqukayo nokunembile kokuthenga. Lapho engekho iwashi, uyakujabulela ukuhamba ngomkhondo noma ukushushuluza phansi.

Yang zhou Ingabe unjiniyela wesoftware osebenza ekwakheni nasekwakheni kwezinhlelo zokufunda umshini. Ukugxila kwakhe kwakamuva kuthuthukisa ukusebenza nokusebenza kwezindleko zokutholwa kwe-AI adtorative. Ngaphandle komsebenzi, uyakujabulela ukuhamba futhi usanda kuthola uthando lokusebenzisa amabanga amade.

Rju ngunjiniyela ngaphakathi kwe-Amazon. Wakha futhi wakha amasistimu wezinhlelo ezisatshalaliswa zokuziqeqesha futhi ezisebenza ekwakheni kahle izinhlelo zokutholwa ukunciphisa i-latency ye-ML ukuthotha. Umsebenzi wangaphandle, uhlola ukusebenzisa i-ai ekhiqizayo yokwakha izindlela zokupheka.

UJames Park UnguMshini oPhenhloko Wokufunda Izixazululo Zokwakha Izixazululo Zomshini Wezezixazululo e-Amazon Web Services. Usebenza nge-Amazon ukuklama, ukwakha, nokuhambisa izixazululo zobuchwepheshe emangele, futhi unentshisekelo ethile ku-AI kanye nomshini wokufunda. Esikhathini sakhe sokuphumula ejabulela ukufuna amasiko amasha, okuhlangenwe nakho nokuhlala usesikhathini nezimo zakamuva zobuchwepheshe.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button