Sethula ukugcinwa kwesikhashana kwesiqukathi ku-Amazon SageMaker AI ukuze uthole imodeli esheshayo yokukala

Namuhla, sijabulile ukumemezela ukugcinwa kwesithombe sesitsha se-Amazon SageMaker AI inference, intuthuko enkulu elandelayo ohambweni lwethu lokuthuthukisa ukukala olusheshayo. Lokhu kusheshisa ukubambezeleka kokuphela-kuya-sekupheleni kufika ku-2x kumamodeli akhiqizayo e-AI phakathi nemicimbi yokukhipha.
Ngokuhamba kweminyaka, i-Amazon SageMaker AI iqhubekile nokunciphisa ukubambezeleka kuzo zonke lezi zigaba zokukala: ukuthola isidingo sokukala, ukunikeza izimo, ukulanda izithombe zesitsha, ukulanda izisindo zemodeli, kanye neziqukathi zokuqala. I-Amazon SageMaker AI ngaphambilini yethule amamethrikhi eminithi amancane e-Amazon CloudWatch ukuze isize ukuthola izidingo zokuphuma ngokushesha ezifika ku-6x ngokushesha kunezindlela ezivamile futhi yethula isixazululo sengxenye ye-inference data caching egcina izithombe zesiqukathi namamodeli wezinto zobuciko ezimweni ezisebenzayo kakade. Le ndlela yehlise ukubambezeleka kokuqala okubandayo kokukala imisebenzi yengxenye ye-inference esebenzisa kabusha izimo ezikhona. Ndawonye, lezi zici zithuthukise ukusabela kokukala okuzenzakalelayo kuzimo lapho ingxenye ye-inference ingase ibekwe endaweni esesivele inikeziwe futhi isebenzise inqolobane ekhona.
Ngokugcinwa kwesikhashana esitsheni, i-Amazon SageMaker AI inweba lokhu kuthuthukiswa kokukala kuzimo lapho kufanele kwethulwe khona izimo ezintsha. Ukugcinwa kunqolobane kwesiqukathi kususa ukubambezeleka kokulandwa kwesithombe sesiqukathi ngisho nalapho izimo ezintsha kufanele ziqaliswe, isimo lapho ukulondoloza okwesikhashana okusekelwe esitolo okwedlule kwakungakwazi ukusiza. Kulokhu okuthunyelwe, sibonisa ukuthi ukugcinwa kwesikhashana kwesiqukathi kukhuluma kanjani nebhodlela lokulanda isithombe sesitsha futhi sibonisa ukuthuthukiswa kokusebenza ongazilindela.
Inselele yokukala: Lapho izimo ezintsha kufanele ziqale
Umdwebo olandelayo ubonisa izinyathelo phakathi nesibonelo sokukala lapho isenzakalo esisha sethulwa.
- Ukunikezwa kwesibonelo: Kwethulwa isibonelo esisha se-Amazon Elastic Compute Cloud (Amazon EC2).
- Ukudonsa kwesithombe sesiqukathi: Isithombe sesitsha sidonswa ku-Amazon Elastic Container Registry (Amazon ECR).
- Ukulanda kwe-artifact yemodeli: Izisindo zemodeli zilandwa ku-Amazon Simple Storage Service (Amazon S3).
- Ukuqaliswa kwesitsha nokuhlolwa kwezempilo: Iseva ye-inference iyaqalisa, ilayishe imodeli kumemori, futhi idlule ukuhlola ukulungela.
Qaphela: Ukulandwa kwesithombe sesitsha nokulandwa kwe-artifact eyimodeli kwenzeka ngokuhambisana.
Ukulandwa kwesithombe sesitsha kuvame ukuba nomthelela omkhulu ekubambezelekeni kwe-endpoint scale-out, ikakhulukazi kumthwalo okhiqizayo we-AI. Le mithwalo yomsebenzi isebenzisa iziqukathi ezinkulu ezifana ne-SageMaker Large Model Inference (LMI, powered by vLLM), vLLM, kanye ne-NVIDIA Triton. Ukufaka kunqolobane isiqukathi kususa isinyathelo sokudonsa isithombe sesitsha phakathi nemicimbi yokukala yesibonelo esisha kumaphethini wephoyinti lokugcina:
- Iziphetho zemodeli eyodwa – Ukukala kutholwa ngokwethula izimo ezengeziwe, ngayinye ibamba ikhophi yayo yemodeli.
- Amaphoyinti okugcina asekelwe engxenyeni esekelwe ekuqondeni – Ukukala kwengeza izimo ezintsha kuphela uma kungekho isenzakalo esikhona esinamandla anele okusingatha ingxenye eyengeziwe yemibono.
Ukulondolozwa kwenqolobane kusisusa kanjani isithombe sokudonsa ibhodlela
Isithombe esilandelayo sibonisa ukuthi umugqa wesikhathi wokukala ushintsha kanjani kumodeli ye-Qwen3-8B (16 GB) kusibonelo se-ml.g6.2xlarge kusetshenziswa isiqukathi se-LMI (okuminyaniswe okungu-17.7 GB).

Ngaphambi Kokugcinwa Kwesiqukathi:
- Donsa isithombe sesitsha kusuka ku-Amazon ECR: imizuzwana engama-333
- Ukulanda kwe-artifact eyimodeli kusukela Amazon S3: 168 imizuzwana
Ukudonsa kwesithombe nokulandwa kwemodeli kusebenze ngokuhambisana, ngakho-ke ukubambezeleka kokuqalisa kokuphela bekuyimizuzwana engama-525.
Ngemva Kokugcinwa Kwesikhashana Kwesitsha:
- Isithombe sesitsha isivele inqolobane endaweni: 0 imizuzwana
- Ukulanda kwe-artifact eyimodeli: imizuzwana 77. Njengoba isithombe sesiqukathi sifakwe kunqolobane yangaphambili, ukulandwa kwemodeli akusaqhudelani ngomkhawulokudonsa wenethiwekhi ngokudonsa kwesithombe, kunciphisa ukubambezeleka kwaso kusuka kumasekhondi angu-168 kuya kumasekhondi angu-77.
Ukubambezeleka kokuqalisa kokuphela kuyehla kuye kumasekhondi angu-258.
Umphumela: Ukugcinwa kwesikhashana kwesitsha kususa ukudonsa kwesithombe endleleni yokuphuma futhi kuqede ukungqubuzana komkhawulokudonsa wenethiwekhi, kunciphisa ukubambezeleka kokuqalisa kokuphela kusuka kumasekhondi angu-525 kuya kumasekhondi angu-258, cishe ukuthuthukiswa kwamaphesenti angu-51. Uma isithombe esifakwe kunqolobane singatholakali, i-SageMaker AI ibuyela ngokuzenzakalelayo ekudonseni isuka e-Amazon ECR, ngakho ukukala akuvinjwa.
Indlela ukugcinwa kwesikhashana kwesiqukathi kusebenza ngayo nezingxenye ze-inference
Ukugcinwa kwesikhashana kwesitsha kusebenza nezici zokukhomba. Uma uphakela izingxenye eziningi ze-inference, inqolobane igcina isithombe ngasinye esiyingqayizivele esibalulwe yizingxenye zakho zokucabanga.
Ukuphepha nokuhlukaniswa komqashi
Ukugcinwa kwesikhashana kwesithombe sesiqukathi kugcina iziqinisekiso ezifanayo eziqinile zokuhlukaniswa kwesiqashi ezihlinzekwa yi-SageMaker AI namuhla. Inqolobane ngayinye inikezelwe endaweni yokugcina yekhasimende elilodwa futhi ayabiwa kuwo wonke ama-akhawunti e-AWS noma amaphoyinti okugcina. Lapho ikhasimende lisusa indawo yalo yokugcina ye-SageMaker AI, inqolobane yesithombe ehlotshaniswa nayo ihlanzwa ngokuzenzakalelayo.
Imiphumela yokusebenza
Ithebula elilandelayo libonisa imiphumela eqashiwe evela kumakhasimende okufinyelela ngaphambi kwesikhathi ahlole ukugcinwa kwesikhashana kwesiqukathi:
| Ikhasimende | Isibonelo | Usayizi wesithombe | Usayizi wemodeli | I-P50 Ngaphambi (isekhondi) | I-P50 Ngemva (isekhondi) | Ukuthuthukiswa kwe-P50 | |
| 1 | Ikhasimende 1 | ml.g4dn.xlarge | 15.7 GB | 0 GB | 381 | 134 | -65% |
| 2 | Ikhasimende 2 | ml.g5.2xlarge | 17.5 GB | 5.8 GB | 346 | 164 | -52% |
| 3 | Ikhasimende 3 | ml.g5.xlarge | 10.6 GB | 6.5 GB | 346 | 216 | -38% |
Ubukhulu bokuthuthuka buncike ohlotsheni lwesibonelo, usayizi wesithombe sesiqukathi, nosayizi wemodeli wephoyinti lokugcina.
Ukuhlanganisa konke okuthathu okuthuthukisiwe kokukalwa okuzenzakalelayo
Ukuze uthole impendulo yokukala eshesha kakhulu, ungahlanganisa wonke amakhono amathathu ethulwe kulo lonke uchungechunge lwethu lokuthuthukisa ukukala okuzenzakalelayo. Ngayinye isusa umthombo ohlukile wokulibaziseka endleleni yokukala.
| Ukuthuthukisa | Lokho elikuthuthukisayo | Ukunika amandla kanjani | |
| 1 | Ukuthuthukiswa kwamamethrikhi amaminithi angaphansi | Izicupha zokukhuphula zidinga ngokushesha ngo-6x | Lungiselela i-a ConcurrentRequestsPerModel noma ConcurrentRequestsPerCopy inqubomgomo yokulandelela okuqondiwe |
| 2 | Inqolobane yedatha yamaphoyinti okugcina asuselwe engxenyeni | Yehlisa isikhathi sokudonsa isithombe lapho wengeza amakhophi emodeli kuzimo ezikhona | Akukho ukukhetha ukungena okudingekile: ukugcinwa kwesikhashana kwesiqukathi kusebenza ngokuzenzakalelayo kumaphoyinti okugcina asekelwe engxenyeni esekelwe ezinhlotsheni zezibonelo zesisheshisi esisekelwayo. |
| 3 | Inqolobane yesithombe sesitsha | Isusa isikhathi sokudonsa isithombe uma kwethulwa izimo ezintsha | Akukho ukukhetha ukungena okudingekile: ukugcinwa kwesikhashana kwesiqukathi kusebenza ngokuzenzakalelayo kunoma iyiphi indawo yokugcina kusetshenziswa izinhlobo zezibonelo ze-accelerator. |
Ngokuhlangene, lokhu kulungiselelwa kususa imithombo emikhulu ye-scale-out latency. Amamethrikhi eminithi engaphansi athola isidingo esisheshayo esingu-6x, acupha izinqumo zokukala ngamasekhondi kunemizuzu. Izendlalelo ezimbili zenqolobane ziyaphelelisana ngokuhambisana nezimbazo zokukala ezihlukene. Uma ikhophi yengxenye entsha ye-inference ibekwa esimweni esikhona, ukugcinwa kwedatha kunqolobane kususa isithombe nokubambezeleka kokulanda imodeli. Uma ukukala kudinga ukwethulwa kwesibonelo esisha, ukugcinwa kwesikhashana kwesithombe sesiqukathi kunikeza isikhathi esingenalutho sokudonsa isithombe ekuqalisweni.
Ukucupha okusekelwe
Ukugcinwa kunqolobane kwesitsha kusekelwa ezinhlotsheni zezibonelo ze-accelerator kuma-endpoints we-SageMaker. Isebenza nanoma yisiphi isithombe sesitsha esibanjwe ku-Amazon ECR, kuhlanganise nezithombe zangokwezifiso. Azikho izinguquko esitsheni sakho ezidingekayo.
Ukugcinwa kwesikhashana kwesiqukathi kuyatholakala kuzo zonke Izifunda ze-AWS zezentengiselwano lapho kusekelwa khona i-SageMaker AI inference. Ukuze uthole uhlu lwakamuva lwezinhlobo zezibonelo ezisekelwayo Nezifunda, bona imibhalo ye-Amazon SageMaker AI.
Isiphetho
Ngokugcinwa kwesikhashana kweziqukathi ezintsha, i-Amazon SageMaker AI ihlinzeka ngohlelo lokukala okuzenzakalelayo okuhloswe ngalo okwakhelwe ukuqondiswa kwe-AI okukhiqizayo.
- Amamethrikhi eminithi elingaphansi avumela ukukala okuzenzakalelayo kubone izinguquko zomthwalo ezifika ku-6x ngokushesha kunamamethrikhi ajwayelekile eminithi elingu-1 we-CloudWatch.
- Ukukala okusheshayo ezimweni ezikhona: Ukugcinwa kwesikhashana kwesiqukathi sesitolo kususa ukudonsa kwesithombe nokubambezeleka kokulanda imodeli lapho kusetshenziswa kabusha izimo ezisebenzayo.
- Ukukala okusheshayo ezimweni ezintsha (lokhu kwethulwa): Inqolobane yesiqukathi isusa ukudonsa kwesithombe lapho kwethulwa izimo ezintsha, kunciphisa ukubambezeleka kokukalwa kokuphela ukuya ekupheleni ngamaphesenti angafika kwangu-50.
Ndawonye, lezi zici zishintsha okuhlangenwe nakho kokukala kwe-SageMaker AI ukusuka kumaminithi wokubambezeleka kokuqala okubandayo kuye kuzimpendulo ezisheshayo, nezibikezelekayo. Izinhlelo zakho zokusebenza ezikhiqizayo ze-AI manje sezingakwazi ukuphatha ukwenyuka kwethrafikhi ngokuzethemba, zigcine ukubambezeleka okuphansi nokutholakala okuphezulu kwabasebenzisi bokugcina.
Ukuze uqalise, sebenzisa imithwalo yakho yokusebenza ye-AI ekhiqizayo endaweni yokugcina ye-SageMaker AI ngohlobo lwesibonelo sesisheshisi esisekelwayo. Ukugcinwa kwesikhashana kwesiqukathi kusebenze ngokuzenzakalelayo. Ukuze ufunde kabanzi mayelana nezinhlobo zezibonelo ezisekelwayo kanye Nezifunda, bona imibhalo ye-Amazon SageMaker AI. Ungaphinda uzame i-AWS Management Console ukuze udale noma ubuyekeze izindawo zakho zokugcina.
Uma sibheka phambili, siyaqhubeka nokutshala imali ekwehliseni ukubambezeleka kwesikali nakakhulu. Hlala ubukele.
Mayelana nababhali



