Ukulotshwa komsindo ngezilimi eziningi okungabizi kakhulu nge-Parakeet-TDT ne-AWS Batch

Izinhlangano eziningi zigcina kungobo yomlando imitapo yolwazi emikhulu, zihlaziya okurekhodiwe kwesikhungo soxhumana nabo, zilungiselela idatha yokuqeqeshwa ye-AI, noma zicubungula ividiyo edingeka kakhulu yemibhalo engezansi. Uma amavolumu edatha akhula kakhulu, izindleko zesevisi ezilawulwa ngokuzenzakalelayo zokuqaphela inkulumo (ASR) zingaba ngokushesha umgoqo oyinhloko wokulinganisa.
Ukuze sibhekane nale nselele yokwehla kwezindleko, sisebenzisa imodeli ye-NVIDIA Parakeet-TDT-0.6B-v3, esetshenziswe nge-AWS Batch kuzimo ezisheshiswayo ze-GPU. I-Parakeet-TDT's Token-and-Duration Transducer architecture ibikezela amathokheni ombhalo kanye nobude bawo ukuze kweqe ngokuhlakanipha ukuthula nokucubungula okungafuneki. Lokhu kusiza ukuzuza ama-oda ezivinini zokucatshangelwa zobukhulu ngokushesha kunesikhathi sangempela. Ngokukhokhela kuphela ukuqhuma okufushane kwekhompyutha kunobude obugcwele bomsindo wakho, ungabhala ngesilinganiso izingxenye zephesenti ngehora lomsindo ngokusekelwe kumabhentshimakhi achazwe kulokhu okuthunyelwe.
Kulokhu okuthunyelwe, sihamba ngokwakha ipayipi lokulotshwa elinokwehla, eliqhutshwa umcimbi elicubungula ngokuzenzakalelayo amafayela alalelwayo alayishwe ku-Amazon Simple Storage Service (Amazon S3), futhi sikubonise ukuthi uzisebenzisa kanjani i-Amazon EC2 Spot Instances kanye neziqondiso zokusakaza ezivinjiwe ukuze unciphise izindleko.
Amakhono emodeli
I-Parakeet-TDT-0.6B-v3, ekhishwe ngo-Agasti 2025, iyimodeli ye-ASR yezilimi eziningi enemithombo evulekile eletha ukunemba okuphezulu kuzo zonke izilimi zase-Europe ezingama-25 ngokutholwa kolimi okuzenzakalelayo kanye nokulayisensa okuvumelana nezimo ngaphansi kwe-CC-BY-4.0. Ngokuya ngamamethrikhi ashicilelwe e-NVIDIA, imodeli igcina isilinganiso samaphutha angu-6.34% (WER) ezimeni ezihlanzekile kanye no-11.66% WER ku-0 dB SNR, futhi isekela umsindo kufika emahoreni amathathu isebenzisa imodi yokunaka yendawo.
Izilimi ezisekelwayo ezingu-25 zifaka isiBulgaria, isiCroatia, isiCzech, isiDanish, isiDashi, isiNgisi, isi-Estonian, isiFinnish, isiFulentshi, isiJalimane, isiGreki, isiHungary, isiNtaliyane, isiLatvia, isiLithuanian, isiMaltese, isiPolish, isiPutukezi, isiRomania, isiSlovak, isiSlovenia, iSpanishi, isiSwedish, isiRashiya, nesi-Ukrainian. Lokhu kungasiza ekwehliseni isidingo samamodeli ahlukene noma ukucushwa kolimi oluthile lapho kusetshenziswa umnotho wamazwe ngamazwe ase-Europe.Ukuze isetshenziswe ku-AWS, imodeli idinga izimo ezinikwe amandla i-GPU okungenani okungu-4 GB VRAM, nakuba i-8 GB inikeza ukusebenza okungcono. Izimo ze-G6 (i-NVIDIA L4 GPUs) zinikeza isilinganiso esingcono kakhulu sezindleko zokusebenza komthwalo wokucabanga osuselwe ekuhlolweni kwethu. Imodeli iphinde yenza kahle ku-G5 (A10G), G4dn (T4), kanye nezimo eziphezulu zokuphuma, i-P5 (H100) noma i-P4 (A100).
Isixazululo sezakhiwo
Inqubo iqala lapho ulayisha ifayela lomsindo ebhakedeni le-S3. Lokhu kuvula umthetho we-Amazon EventBridge othumela umsebenzi ku-AWS Batch. I-AWS Batch ihlinzeka ngezinsiza zekhompiyutha ezisheshiswe yi-GPU, futhi izimo ezinikeziwe zidonsa isithombe sethu sesitsha ngemodeli egcinwe ngaphambili evela ku-Amazon Elastic Container Registry (Amazon ECR). Isikripthi se-inference silanda futhi sicubungule ifayela, bese silayisha okulotshiweyo kwe-JSON okunesitembu sesikhathi ebhakedeni lokuphumayo le-S3. Isilinganiso sezakhiwo sifinyelela kuziro uma singenzi lutho, ngakho izindleko zenziwa kuphela phakathi nokubala okusebenzayo.
Ukuze ungene ngokujulile ezingxenyeni ezijwayelekile zezakhiwo, bheka okuthunyelwe kwethu kwangaphambilini, okulotshiweyo komsindo we-Whisper okuxhaswe yi-AWS Batch kanye ne-AWS Inferentia.
Umfanekiso 1. Ipayipi lokulotshwa komsindo eliqhutshwa umcimbi ne-Amazon EventBridge kanye ne-AWS Batch
Okudingekayo
- Dala i-akhawunti ye-AWS uma ungenayo eyodwa bese ungena ngemvume. Dala umsebenzisi usebenzisa i-AWS IAM Identity Center enezimvume zomlawuli ezigcwele njengoba kuchazwe kokuthi Engeza abasebenzisi.
- Faka i-AWS Command Line Interface (AWS CLI) emshinini wakho wokuthuthukisa wendawo bese udala iphrofayela yomsebenzisi womlawuli njengoba kuchazwe kokuthi Setha i-AWS CLI.
- Faka i-Docker emshinini wangakini.
- Vala inqolobane ye-GitHub emshinini wangakini.
Ukwakha isithombe sesitsha
Inqolobane ihlanganisa ifayela le-Docker elakha isithombe sesitsha esilula esilungiselelwe ukusebenza kwe-inference. Isithombe sisebenzisa i-Amazon Linux 2023 njengesisekelo, sifaka i-Python 3.12, futhi sigcina ngaphambili imodeli ye-Parakeet-TDT-0.6B-v3 ngesikhathi sokwakha ukuze kuncishiswe ukubambezeleka kokulanda ngesikhathi sokusebenza:
Iphushela ku-Amazon ECR
Inqolobane ihlanganisa iskripthi se-updateImage.sh esisingatha ukutholwa kwendawo (CodeBuild noma EC2), esakha isithombe sesiqukathi, sidale ikhosombe le-ECR uma kudingeka, esivumela ukuskena kokuba sengozini, futhi siphushe isithombe. Yigijime nge:./updateImage.sh
Isebenzisa isisombululo
Isixazululo sisebenzisa isifanekiso se-AWS CloudFormation (deployment.yaml) ukuze sihlinzeke ngengqalasizinda. Umbhalo we-buildArch.sh wenza ukusetshenziswa ngokuzenzakalelayo ngokuthola Isifunda sakho se-AWS, ukuqoqa i-VPC, i-subnet, nolwazi lweqembu lezokuphepha, futhi kuthunyelwa isitaki se-CloudFormation:
./buildArch.shNgaphansi kwe-hood, lokhu kusebenza:
Isifanekiso se-CloudFormation sidala indawo yekhompiyutha ye-AWS Batch enezimo ze-G6 ne-G5 GPU, ulayini wemisebenzi, incazelo yomsebenzi ekhomba isithombe sakho se-ECR, okokufaka nokukhiphayo amabhakede e-S3 anezaziso ze-EventBridge ezinikwe amandla. Iphinde idale umthetho we-EventBridge ovula umsebenzi we-Batch ekulayisheni kwe-S3, ukulungiselelwa komenzeli we-Amazon CloudWatch we-GPU/CPU/ukuqapha kwenkumbulo, kanye nezindima ze-IAM ezinezinqubomgomo ezinelungelo elincane. I-AWS Batch ivumela ukukhethwa kwezithombe ze-Amazon Linux 2023 GPU ngokucacisa ImageType: ECS_AL2023_NVIDIA ekucushweni kwemvelo yekhompyutha.
Kungenjalo, ungakwazi ukusebenzisa ngokuqondile ikhonsoli ye-AWS CloudFormation usebenzisa isixhumanisi sokuqalisa esinikezwe endaweni yokugcina i-README.
Ilungiselela izimo ze-Spot
I-Amazon EC2 Spot Instances ingasiza ekunciphiseni izindleko ngokuqhubekayo, ngokusebenzisa umthamo wakho we-EC2 ongasetshenzisiwe ngesaphulelo esingafika ku-90% kuye ngohlobo lwesibonelo sakho. Ukuze unike amandla i-Spot Instances, sishintsha indawo yekhompiyutha kuyo deployment.yaml:
Ungakwazi ukunika amandla lokhu ngokusetha -i-parameter-overrides UseSpotInstances=Yes uma ugijima aws cloudformation deploy. I SPOT_PRICE_CAPACITY_OPTIMIZED isu lokwaba likhetha amachibi e-Spot Instance womabili okungenzeka kancane ukuthi aphazamiseke futhi anenani eliphansi kakhulu elingenzeka. Izinhlobo zezibonelo ezihlukanisayo (G6 xlarge, G6 2xlarge, G5 xlarge) zingathuthukisa ukutholakala kwe-Spot. Ukusetha i-MinvCpus: 0 iqinisekisa ukuthi imvelo ifinyelela kuqanda uma ingenzi lutho, ukuze ugweme ukungena ezindlekweni phakathi komthwalo womsebenzi. Njengoba imisebenzi ye-ASR ingenaso isimo futhi ingenangqondo, iyifanele kahle iSpot. Uma isigameko sifunwa kabusha, Iqoqo le-AWS lizama futhi umsebenzi ngokuzenzakalelayo (kulungiselelwe ngemizamo yokuzama futhi emi-2 encazelweni yomsebenzi).
Ukuphatha inkumbulo yomsindo omude
Ukusetshenziswa kwenkumbulo yemodeli yeParakeet-TDT kukala ngokuhambisana nobude besikhathi somsindo. Isishumeki se-Fast Conformer kufanele sikhiqize futhi sigcine izethulo zesici zesignali yomsindo egcwele, sidale ukuncika okuqondile lapho ubude bomsindo obuphindwe kabili cishe buphinda kabili ukusetshenziswa kwe-VRAM. Ngokusho kwekhadi lemodeli, ngokunaka okugcwele imodeli ingacubungula imizuzu engama-24 inikezwe i-80GB ye-VRAM.
I-NVIDIA ikhuluma ngalokhu nge- ukunakwa kwendawo imodi esekela kufikela kumahora angu-3 omsindo ku-80 GB A100:
Okucatshangwayo kokusakaza okuvikelwe
Ngomsindo ongaphezu kwamahora angu-3, ​​noma ukucubungula umsindo omude ngendlela engabizi kakhulu kuzingxenyekazi zekhompuyutha ezijwayelekile njenge-g6.xlarge, sisebenzisa okucatshangwayo kokusakaza okugcinwe ku-buffered. Ithathelwe esibonelweni sokusakaza sokusakaza-bukhoma se-NVIDIA NeMo, le nqubo icubungula umsindo ngezingxenyana ezeqinayo kunokulayisha umongo ogcwele enkumbulweni.
Silungisa izingcezu zamasekhondi angu-20 ngomongo wesobunxele wamasekhondi angu-5 kanye nomongo wesokudla wamasekhondi angu-3 ukuze sigcine ikhwalithi yokubhala emingceleni yenqwaba (qaphela ukuthi ukunemba kungase kwehlise izinga lapho ushintsha lezi zinhlaka, ngakho-ke vivinya ukuze uthole ukucushwa okuphelele. Ukunciphisa ama-chunk_secs kwandisa isikhathi sokucubungula):
Ukucutshungulwa komsindo kumasayizi we-chunk engaguquki kususa ukusetshenziswa kwe-VRAM kusuka kubude bomsindo obuphelele, okuvumela isenzakalo esisodwa se-g6.xlarge ukucubungula ifayela lamahora angu-10 elinenkumbulo yonyawo efanayo njengemizuzu engu-10.

Umfanekiso 2. Isiphetho sokusakaza-bukhoma okusivivane sicubungula umsindo ngezingcezu ezigqagqene ngokusetshenziswa kwememori njalo.
Ukuze usebenzise ngokusakazwa kwebhafa okunikwe amandla, setha EnableStreaming=Yes ipharamitha.
Ukuhlola nokuqapha
Ukuze siqinisekise isixazululo esilinganisweni, senze isilingo ngamafayela omsindo ayimizuzu engu-1,000 afanayo ayimizuzu engu-50 avela engqungqutheleni yezindaba yabasebenzi be-NASA yangaphambi kwendiza, asakazwa kuzo zonke izikhathi ezingu-100 g6.xlarge sicubungula amafayela angu-10 lilinye.
Umfanekiso 3. Imisebenzi yeqoqo esebenza ngesikhathi esisodwa kuzimo ezingu-100 g6.xlarge.
Ukuthunyelwa kufaka phakathi ukulungiswa komenzeli we-Amazon CloudWatch oqoqa ukusetshenziswa kwe-GPU, ukudonsa amandla, ukusetshenziswa kwe-VRAM, ukusetshenziswa kwe-CPU, ukusetshenziswa kwememori, nokusetshenziswa kwediski ngezikhathi ezithile eziyi-10. Lawa mamethrikhi avela ngaphansi kwesikhala segama se-CWAgent, akuvumela ukuthi wakhe amadeshibhodi okuqapha kwesikhathi sangempela.
Ukusebenza nokuhlaziywa kwezindleko
Ukuqinisekisa ukusebenza kahle kwezakhiwo, silinganisele isistimu sisebenzisa amafayela omsindo amaningi efomu elide.
Imodeli ye-Parakeet-TDT-0.6B-v3 izuze isivinini esingahluziwe se-inference 0.24 imizuzwana ngomzuzu womsindo. Kodwa-ke, ipayipi eliphelele lihlanganisa ne-overhead yokulayisha imodeli kumemori, ukulayisha umsindo, ukucubungula kuqala okokufaka kanye nokucubungula okukhiphayo. Ngenxa yalokhu okungaphezulu, ukuthuthukiswa kwezindleko ezilungile kwenzeka kumsindo wefomu ende ukuze kwandiswe isikhathi sokucubungula.
Imiphumela yebhentshimakhi (g6.xlarge):
- Ubude besikhathi somsindo: Amahora ama-3 nemizuzu engama-25 (imizuzu engama-205)
- Isamba Sesikhathi Somsebenzi: 100 isekhondi
- Isivinini Sokucubungula Esisebenzayo: 0.49 imizuzwana ngomzuzu womsindo
- Ukwehlukaniswa kwezindleko
Ngokusekelwe emananini Esifunda sase-us-east-1 kusibonelo se-g6.xlarge, singalinganisela izindleko ngeminithi ngalinye lokucutshungulwa komsindo.
| Imodeli Yamanani | Izindleko zehora (g6.xlarge)* | Izindleko Ngomzuzu Womsindo ngamunye |
|---|---|---|
| Efunwa kakhulu | ~$0.805 | **$0.00011** |
| Spot Instances | ~$0.374 | **$0.00005** |
*Izintengo ziyizilinganiso ezisuselwe kuzilinganiso ze-us-east-1 ngesikhathi sokubhala. Izintengo ze-Spot ziyahlukahluka nge-Availability Zone futhi zingashintsha.
Lokhu kuqhathanisa kugqamisa inzuzo yezomnotho yendlela yokubamba wena ngokwakho yomthwalo wevolumu ephezulu, iletha inani lokulotshiwe kwesilinganiso esikhulu uma kuqhathaniswa namasevisi e-API aphethwe.
Hlanza
Ukuze ugweme ukungena ezindlekweni zesikhathi esizayo, susa izinsiza ezidalwe yilesi sixazululo:
- Thulula wonke amabhakede e-S3 (okokufaka, okukhiphayo, namalogi).
- Susa isitaki se-CloudFormation:
aws cloudformation delete-stack --stack-name batch-gpu-audio-transcription
- Uma uthanda, susa inqolobane ye-ECR nezithombe zesiqukathi.
Ukuze uthole imiyalelo enemininingwane yokuhlanza, bheka isigaba sokuhlanza sekhosombe README.
Isiphetho
Kulokhu okuthunyelwe, sibonise indlela yokwakhiwa kwepayipi lokulotshwa komsindo elicubungula umsindo esikalini kumafrakshini wesenti ngehora. Ngokuhlanganisa imodeli ye-NVIDIA ye-Parakeet-TDT-0.6B-v3 ne-AWS Batch kanye ne-EC2 Spot Instances, ungakwazi ukuloba ngezilimi ezingu-25 zase-Europe ngokutholwa okuzenzakalelayo kolimi futhi usize ukunciphisa izindleko uma kuqhathaniswa nezinye izixazululo. Inqubo ye-inference yokusakaza evikelwe inweba leli khono kumsindo wobude obuhlukahlukene kuzingxenyekazi zekhompuyutha ezijwayelekile, futhi isakhiwo esiqhutshwa umcimbi sikala ngokuzenzakalelayo sisuka kuqanda ukuze siphathe imithwalo yomsebenzi eguquguqukayo.
Ukuze uqalise, hlola isampula yekhodi endaweni ye-GitHub.
Mayelana nababhali



