Reactive Machines

Sethula ukugcinwa kwesikhashana kwesiqukathi ku-Amazon SageMaker AI ukuze uthole imodeli esheshayo yokukala

Namuhla, sijabulile ukumemezela ukugcinwa kwesithombe sesitsha se-Amazon SageMaker AI inference, intuthuko enkulu elandelayo ohambweni lwethu lokuthuthukisa ukukala olusheshayo. Lokhu kusheshisa ukubambezeleka kokuphela-kuya-sekupheleni kufika ku-2x kumamodeli akhiqizayo e-AI phakathi nemicimbi yokukhipha.

Ngokuhamba kweminyaka, i-Amazon SageMaker AI iqhubekile nokunciphisa ukubambezeleka kuzo zonke lezi zigaba zokukala: ukuthola isidingo sokukala, ukunikeza izimo, ukulanda izithombe zesitsha, ukulanda izisindo zemodeli, kanye neziqukathi zokuqala. I-Amazon SageMaker AI ngaphambilini yethule amamethrikhi eminithi amancane e-Amazon CloudWatch ukuze isize ukuthola izidingo zokuphuma ngokushesha ezifika ku-6x ngokushesha kunezindlela ezivamile futhi yethula isixazululo sengxenye ye-inference data caching egcina izithombe zesiqukathi namamodeli wezinto zobuciko ezimweni ezisebenzayo kakade. Le ndlela yehlise ukubambezeleka kokuqala okubandayo kokukala imisebenzi yengxenye ye-inference esebenzisa kabusha izimo ezikhona. Ndawonye, ​​lezi zici zithuthukise ukusabela kokukala okuzenzakalelayo kuzimo lapho ingxenye ye-inference ingase ibekwe endaweni esesivele inikeziwe futhi isebenzise inqolobane ekhona.

Ngokugcinwa kwesikhashana esitsheni, i-Amazon SageMaker AI inweba lokhu kuthuthukiswa kokukala kuzimo lapho kufanele kwethulwe khona izimo ezintsha. Ukugcinwa kunqolobane kwesiqukathi kususa ukubambezeleka kokulandwa kwesithombe sesiqukathi ngisho nalapho izimo ezintsha kufanele ziqaliswe, isimo lapho ukulondoloza okwesikhashana okusekelwe esitolo okwedlule kwakungakwazi ukusiza. Kulokhu okuthunyelwe, sibonisa ukuthi ukugcinwa kwesikhashana kwesiqukathi kukhuluma kanjani nebhodlela lokulanda isithombe sesitsha futhi sibonisa ukuthuthukiswa kokusebenza ongazilindela.

Inselele yokukala: Lapho izimo ezintsha kufanele ziqale

Umdwebo olandelayo ubonisa izinyathelo phakathi nesibonelo sokukala lapho isenzakalo esisha sethulwa.

  • Ukunikezwa kwesibonelo: Kwethulwa isibonelo esisha se-Amazon Elastic Compute Cloud (Amazon EC2).
  • Ukudonsa kwesithombe sesiqukathi: Isithombe sesitsha sidonswa ku-Amazon Elastic Container Registry (Amazon ECR).
  • Ukulanda kwe-artifact yemodeli: Izisindo zemodeli zilandwa ku-Amazon Simple Storage Service (Amazon S3).
  • Ukuqaliswa kwesitsha nokuhlolwa kwezempilo: Iseva ye-inference iyaqalisa, ilayishe imodeli kumemori, futhi idlule ukuhlola ukulungela.

Qaphela: Ukulandwa kwesithombe sesitsha nokulandwa kwe-artifact eyimodeli kwenzeka ngokuhambisana.

Ukulandwa kwesithombe sesitsha kuvame ukuba nomthelela omkhulu ekubambezelekeni kwe-endpoint scale-out, ikakhulukazi kumthwalo okhiqizayo we-AI. Le mithwalo yomsebenzi isebenzisa iziqukathi ezinkulu ezifana ne-SageMaker Large Model Inference (LMI, powered by vLLM), vLLM, kanye ne-NVIDIA Triton. Ukufaka kunqolobane isiqukathi kususa isinyathelo sokudonsa isithombe sesitsha phakathi nemicimbi yokukala yesibonelo esisha kumaphethini wephoyinti lokugcina:

  • Iziphetho zemodeli eyodwa – Ukukala kutholwa ngokwethula izimo ezengeziwe, ngayinye ibamba ikhophi yayo yemodeli.
  • Amaphoyinti okugcina asekelwe engxenyeni esekelwe ekuqondeni – Ukukala kwengeza izimo ezintsha kuphela uma kungekho isenzakalo esikhona esinamandla anele okusingatha ingxenye eyengeziwe yemibono.

Ukulondolozwa kwenqolobane kusisusa kanjani isithombe sokudonsa ibhodlela

Isithombe esilandelayo sibonisa ukuthi umugqa wesikhathi wokukala ushintsha kanjani kumodeli ye-Qwen3-8B (16 GB) kusibonelo se-ml.g6.2xlarge kusetshenziswa isiqukathi se-LMI (okuminyaniswe okungu-17.7 GB).

Ukuqhathaniswa komugqa wesikhathi okubonisa ukubambezeleka kwesikali ngaphambi nangemuva kokugcinwa kwesikhashana kwesiqukathi semodeli ye-Qwen3-8B kusibonelo se-ml.g6.2xlarge

Ngaphambi Kokugcinwa Kwesiqukathi:

  1. Donsa isithombe sesitsha kusuka ku-Amazon ECR: imizuzwana engama-333
  2. Ukulanda kwe-artifact eyimodeli kusukela Amazon S3: 168 imizuzwana

Ukudonsa kwesithombe nokulandwa kwemodeli kusebenze ngokuhambisana, ngakho-ke ukubambezeleka kokuqalisa kokuphela bekuyimizuzwana engama-525.

Ngemva Kokugcinwa Kwesikhashana Kwesitsha:

  1. Isithombe sesitsha isivele inqolobane endaweni: 0 imizuzwana
  2. Ukulanda kwe-artifact eyimodeli: imizuzwana 77. Njengoba isithombe sesiqukathi sifakwe kunqolobane yangaphambili, ukulandwa kwemodeli akusaqhudelani ngomkhawulokudonsa wenethiwekhi ngokudonsa kwesithombe, kunciphisa ukubambezeleka kwaso kusuka kumasekhondi angu-168 kuya kumasekhondi angu-77.

Ukubambezeleka kokuqalisa kokuphela kuyehla kuye kumasekhondi angu-258.

Umphumela: Ukugcinwa kwesikhashana kwesitsha kususa ukudonsa kwesithombe endleleni yokuphuma futhi kuqede ukungqubuzana komkhawulokudonsa wenethiwekhi, kunciphisa ukubambezeleka kokuqalisa kokuphela kusuka kumasekhondi angu-525 kuya kumasekhondi angu-258, cishe ukuthuthukiswa kwamaphesenti angu-51. Uma isithombe esifakwe kunqolobane singatholakali, i-SageMaker AI ibuyela ngokuzenzakalelayo ekudonseni isuka e-Amazon ECR, ngakho ukukala akuvinjwa.

Indlela ukugcinwa kwesikhashana kwesiqukathi kusebenza ngayo nezingxenye ze-inference

Ukugcinwa kwesikhashana kwesitsha kusebenza nezici zokukhomba. Uma uphakela izingxenye eziningi ze-inference, inqolobane igcina isithombe ngasinye esiyingqayizivele esibalulwe yizingxenye zakho zokucabanga.

Ukuphepha nokuhlukaniswa komqashi

Ukugcinwa kwesikhashana kwesithombe sesiqukathi kugcina iziqinisekiso ezifanayo eziqinile zokuhlukaniswa kwesiqashi ezihlinzekwa yi-SageMaker AI namuhla. Inqolobane ngayinye inikezelwe endaweni yokugcina yekhasimende elilodwa futhi ayabiwa kuwo wonke ama-akhawunti e-AWS noma amaphoyinti okugcina. Lapho ikhasimende lisusa indawo yalo yokugcina ye-SageMaker AI, inqolobane yesithombe ehlotshaniswa nayo ihlanzwa ngokuzenzakalelayo.

Imiphumela yokusebenza

Ithebula elilandelayo libonisa imiphumela eqashiwe evela kumakhasimende okufinyelela ngaphambi kwesikhathi ahlole ukugcinwa kwesikhashana kwesiqukathi:

Ikhasimende Isibonelo Usayizi wesithombe Usayizi wemodeli I-P50 Ngaphambi (isekhondi) I-P50 Ngemva (isekhondi) Ukuthuthukiswa kwe-P50
1 Ikhasimende 1 ml.g4dn.xlarge 15.7 GB 0 GB 381 134 -65%
2 Ikhasimende 2 ml.g5.2xlarge 17.5 GB 5.8 GB 346 164 -52%
3 Ikhasimende 3 ml.g5.xlarge 10.6 GB 6.5 GB 346 216 -38%

Ubukhulu bokuthuthuka buncike ohlotsheni lwesibonelo, usayizi wesithombe sesiqukathi, nosayizi wemodeli wephoyinti lokugcina.

Ukuhlanganisa konke okuthathu okuthuthukisiwe kokukalwa okuzenzakalelayo

Ukuze uthole impendulo yokukala eshesha kakhulu, ungahlanganisa wonke amakhono amathathu ethulwe kulo lonke uchungechunge lwethu lokuthuthukisa ukukala okuzenzakalelayo. Ngayinye isusa umthombo ohlukile wokulibaziseka endleleni yokukala.

Ukuthuthukisa Lokho elikuthuthukisayo Ukunika amandla kanjani
1 Ukuthuthukiswa kwamamethrikhi amaminithi angaphansi Izicupha zokukhuphula zidinga ngokushesha ngo-6x Lungiselela i-a ConcurrentRequestsPerModel noma ConcurrentRequestsPerCopy inqubomgomo yokulandelela okuqondiwe
2 Inqolobane yedatha yamaphoyinti okugcina asuselwe engxenyeni Yehlisa isikhathi sokudonsa isithombe lapho wengeza amakhophi emodeli kuzimo ezikhona Akukho ukukhetha ukungena okudingekile: ukugcinwa kwesikhashana kwesiqukathi kusebenza ngokuzenzakalelayo kumaphoyinti okugcina asekelwe engxenyeni esekelwe ezinhlotsheni zezibonelo zesisheshisi esisekelwayo.
3 Inqolobane yesithombe sesitsha Isusa isikhathi sokudonsa isithombe uma kwethulwa izimo ezintsha Akukho ukukhetha ukungena okudingekile: ukugcinwa kwesikhashana kwesiqukathi kusebenza ngokuzenzakalelayo kunoma iyiphi indawo yokugcina kusetshenziswa izinhlobo zezibonelo ze-accelerator.

Ngokuhlangene, lokhu kulungiselelwa kususa imithombo emikhulu ye-scale-out latency. Amamethrikhi eminithi engaphansi athola isidingo esisheshayo esingu-6x, acupha izinqumo zokukala ngamasekhondi kunemizuzu. Izendlalelo ezimbili zenqolobane ziyaphelelisana ngokuhambisana nezimbazo zokukala ezihlukene. Uma ikhophi yengxenye entsha ye-inference ibekwa esimweni esikhona, ukugcinwa kwedatha kunqolobane kususa isithombe nokubambezeleka kokulanda imodeli. Uma ukukala kudinga ukwethulwa kwesibonelo esisha, ukugcinwa kwesikhashana kwesithombe sesiqukathi kunikeza isikhathi esingenalutho sokudonsa isithombe ekuqalisweni.

Ukucupha okusekelwe

Ukugcinwa kunqolobane kwesitsha kusekelwa ezinhlotsheni zezibonelo ze-accelerator kuma-endpoints we-SageMaker. Isebenza nanoma yisiphi isithombe sesitsha esibanjwe ku-Amazon ECR, kuhlanganise nezithombe zangokwezifiso. Azikho izinguquko esitsheni sakho ezidingekayo.

Ukugcinwa kwesikhashana kwesiqukathi kuyatholakala kuzo zonke Izifunda ze-AWS zezentengiselwano lapho kusekelwa khona i-SageMaker AI inference. Ukuze uthole uhlu lwakamuva lwezinhlobo zezibonelo ezisekelwayo Nezifunda, bona imibhalo ye-Amazon SageMaker AI.

Isiphetho

Ngokugcinwa kwesikhashana kweziqukathi ezintsha, i-Amazon SageMaker AI ihlinzeka ngohlelo lokukala okuzenzakalelayo okuhloswe ngalo okwakhelwe ukuqondiswa kwe-AI okukhiqizayo.

  1. Amamethrikhi eminithi elingaphansi avumela ukukala okuzenzakalelayo kubone izinguquko zomthwalo ezifika ku-6x ngokushesha kunamamethrikhi ajwayelekile eminithi elingu-1 we-CloudWatch.
  2. Ukukala okusheshayo ezimweni ezikhona: Ukugcinwa kwesikhashana kwesiqukathi sesitolo kususa ukudonsa kwesithombe nokubambezeleka kokulanda imodeli lapho kusetshenziswa kabusha izimo ezisebenzayo.
  3. Ukukala okusheshayo ezimweni ezintsha (lokhu kwethulwa): Inqolobane yesiqukathi isusa ukudonsa kwesithombe lapho kwethulwa izimo ezintsha, kunciphisa ukubambezeleka kokukalwa kokuphela ukuya ekupheleni ngamaphesenti angafika kwangu-50.

Ndawonye, ​​lezi zici zishintsha okuhlangenwe nakho kokukala kwe-SageMaker AI ukusuka kumaminithi wokubambezeleka kokuqala okubandayo kuye kuzimpendulo ezisheshayo, nezibikezelekayo. Izinhlelo zakho zokusebenza ezikhiqizayo ze-AI manje sezingakwazi ukuphatha ukwenyuka kwethrafikhi ngokuzethemba, zigcine ukubambezeleka okuphansi nokutholakala okuphezulu kwabasebenzisi bokugcina.

Ukuze uqalise, sebenzisa imithwalo yakho yokusebenza ye-AI ekhiqizayo endaweni yokugcina ye-SageMaker AI ngohlobo lwesibonelo sesisheshisi esisekelwayo. Ukugcinwa kwesikhashana kwesiqukathi kusebenze ngokuzenzakalelayo. Ukuze ufunde kabanzi mayelana nezinhlobo zezibonelo ezisekelwayo kanye Nezifunda, bona imibhalo ye-Amazon SageMaker AI. Ungaphinda uzame i-AWS Management Console ukuze udale noma ubuyekeze izindawo zakho zokugcina.

Uma sibheka phambili, siyaqhubeka nokutshala imali ekwehliseni ukubambezeleka kwesikali nakakhulu. Hlala ubukele.


Mayelana nababhali

Mona Mona

Mona Mona

UMona njengamanje usebenza njengochwepheshe be-Sr AI/ML Solutions Architect e-Amazon. Uke wasebenza ku-Google ngaphambilini njengochwepheshe okhiqiza i-AI oholayo. Ungumbhali oshicilelwe wezincwadi ezimbili Ukucutshungulwa Kolimi Lwemvelo nge-AWS AI Services: Thola ukuqonda kwamasu kudatha engahlelekile nge-Amazon Textract ne-Amazon Comprehend kanye ne-Google Cloud Certified Professional Machine Learning Guide. Ugunyaze amabhulogi angu-19 ku-AI/ML kanye nobuchwepheshe bamafu kanye nomlobi wephepha lokucwaninga nge-CORD19 Neural Search eliwine umklomelo Wephepha Lokucwaninga Elingcono Kakhulu engqungqutheleni ehlonishwayo ye-AAAI (Inhlangano Yokuthuthukiswa Kokuhlakanipha Kokwenziwa).Ungaxhuma i-Mona ku-Linkedin

Kunal Shah

Kunal Shah

UKunal ungunjiniyela omkhulu wokuthuthukisa isoftware kwa-Amazon Web Services. Uthando lwakhe lulele ekukhipheni amamodeli okufunda ngomshini (ML) ukuze aqonde, futhi uqhutshwa isifiso esinamandla sokufunda nokubamba iqhaza ekuthuthukisweni kwamathuluzi anamandla e-AI angakha umthelela womhlaba wangempela. Ngaphandle kwemisebenzi yakhe yobungcweti, uthanda ukubuka amamuvi omlando, ukuvakasha kanye nemidlalo ye-adventure.

Alwin (Qiyun) Zhao

U-Alwin (Qiyun) Zhao

U-Alwin (Qiyun) Zhao Ungumphathi Wokuthuthukiswa Kwesofthiwe eqenjini le-Amazon SageMaker Inference, lapho akha khona ingqalasizinda ephethwe eqondisayo eyenza amakhasimende akwazi ukutshala imithwalo yemisebenzi ye-ML ne-GenAI ngokuthembekile esikalini. Uhola imizamo yobunjiniyela kulo lonke ukuthuthukiswa kokusebenza kwezinga lesistimu, ukuphathwa kwamandla okusheshisa, amamodeli wokusabalalisa amamodeli, nokuhambisana nokuphepha – ukuqinisekisa ukuthi amakhasimende athola ukutholakala okuphezulu kwemithwalo yawo yokusebenza.

UDmitry Soldatkin

UDmitry Soldatkin

UDmitry ungumholi Womhlaba Wonke we-Specialist Solutions Architecture, iSageMaker Inference kwa-AWS. Uhola imizamo yokusiza amakhasimende akhe, akhe, futhi alungiselele izixazululo ze-GenAI ne-AI/ML kulo lonke ibhizinisi. Umsebenzi wakhe uhlanganisa izinhlobonhlobo zamacala okusebenzisa i-ML, agxile kakhulu ku-Generative AI, ukufunda okujulile, kanye nokuthumela i-ML esikalini. Ubambisene nezinkampani kuzo zonke izimboni okubalwa ezezezimali, umshwalense, nezokuxhumana. Ungaxhumana no-Dmitry ku-LinkedIn.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button