Generative AI

I-StepFun Ikhipha Isinyathelo 3.7 Flash: Imodeli Yombono Yolimi ye-198B MoE Yama-ejenti Okubhala Amakhodi Nokugeleza Kokusebenza Kokusesha

I-StepFun namuhla ikhiphe Isinyathelo 3.7 Flash, imodeli ye-multimodal Mixture-of-Experts eqondise amacala okusetshenziswa kwe-ejenti. Yengeza okokufaka kombono komdabu kanye nokwethenjwa kokusetshenziswa kwamathuluzi okuthuthukisiwe ngesinyathelo 3.5 Flash.

Siyini Isinyathelo 3.7 Flash?

Isinyathelo 3.7 Flash iyi-a 198B-parameter sparse Mixture-of-Experts (MoE) imodeli yolimi lombono. Ibhanqa a 196B-ipharamitha yomgogodla wolimi nge 1.8B-ipharamitha yokufaka ikhodi yombono (ViT) ukuze uthole ukuqonda kwesithombe somdabu.

Imodeli isebenza cishe 11B amapharamitha ithokheni ngayinye ngesikhathi sokukhulelwa. Ezakhiweni ze-MoE, isethi engaphansi kuphela “yochwepheshe” bamanethiwekhi angaphansi avuthayo ngokudlula okuya phambili – hhayi inethiwekhi egcwele. Lokhu kugcina i-inference compute iseduze nemodeli eminyene engu-11B kuyilapho kugcinwa isabelomali sepharamitha esiphelele esingu-198B.

Imininingwane ebalulekile:

Ukucaciswa Inani
Ingqikithi yamapharamitha 198B (196B ulimi + 1.8B ViT)
Amapharamitha asebenzayo ngethokheni ngayinye ~11B
Iwindi lokuqukethwe 256k amathokheni
Okokusebenza Kufika kumathokheni angu-400/isekhondi
Amazinga okucabanga Phansi, medium, high
Ilayisensi I-Apache 2.0

Amanothi Ezakhiwo

Isifaki khodi sombono sisebenza njengemojuli ehlukile ye-1.8B ViT. Ifaka izethulo zesithombe kumongo womgogodla wolimi. Isinyathelo 3.5 I-Flash yayingenakho ukwesekwa kwe-multimodal; lokhu isengezo esisha ku-3.7.

Ukujula okukhethekayo okuthathu – okuphansi, okumaphakathi, nokuphezulu – kuvumela onjiniyela bahwebe ngokubambezeleka ngokujula kokucabanga. Okuphansi kuyashesha futhi kushibhile; okuphezulu kusebenza ukubala okwengeziwe ngempendulo ngayinye.

Ukusebenza Kwekhodi Ye-Agentic

Vuliwe I-SWE-Bench ProIsinyathelo 3.7 amaphuzu Flash 56.26%ikhuphuke isuka ku-Isinyathelo 3.5 Flash sika-51.3% — inzuzo ecishe ibe ngamaphesenti angu-5. Vuliwe I-Terminal-Bench 2.1ifaka amaphuzu 59.55%inyuke isuka ku-53.37%.

Vuliwe I-SWE-MTLG (ibhentshimakhi yokubhala ikhodi yesizukulwane eside enemisebenzi eminingi), iyazuza 72.42%.

Ukuhambisana kwe-cross-harness ku-StepFun yangaphakathi Isinyathelo-SWE-Ibhentshi:

Isikafula Isinyathelo 3.7 Flash Isinyathelo 3.5 Flash
Umenzeli weHermes 67.5% 60.0%
I-OpenClaw 67.0% 47.0%
I-KiloCode 67.5% 59.0%
I-RooCode 64.5% 43.0%
Claude Ikhodi 71.5% 73.0%
I-OpenCode 64.5% 57.0%

Isinyathelo 3.5 I-Flash isukela ku-43% kuya ku-73% kuwo wonke amahhanisi. Isinyathelo 3.7 Flash isukela ku-64.5% iye ku-71.5%. Ekukhiqizeni, ama-ejenti okubhala amakhodi avame ukusebenza ngaphakathi kwezikafula ezingafani – ngayinye inezimiso zayo ezikhuthazayo nama-schema amathuluzi. Ukwehluka okuncane kwehhanisi ngalinye kusho ukuziphatha okubikezelwe kakhudlwana ekusetheni okuhlukahlukene.

Imodi yomeluleki

Isinyathelo 3.7 Flash isekela Imodi yomelulekiUkuqaliswa kwe-StepFun kwesu lomeluleki elichazwe yi-Anthropic. Imodeli isebenzisa iluphu ye-ajenti ekugcineni – amathuluzi okushaya ucingo, imiphumela yokufunda, ukuphindaphinda – futhi ikhuphukela kumodeli yomeluleki enkulu kuphela ezindaweni ezithile zokuguquguquka, njengokuhlela noma ukululama ekuhlulekeni okuphindaphindiwe. Ukugijima okuningi kuhlala kuzindleko zamabi.

Nge-Advisor Mode enikwe amandla ku-SWE-Bench Verified, i-StepFun ibika Isinyathelo 3.7 I-Flash ifinyelela U-97% wokusebenza kwekhodi kuka-Claude Opus 4.6 cishe endaweni eyodwa-yesishiyagalolunye yezindleko zomsebenzi ngamunye ($0.19 vs. $1.76 ngomsebenzi ngamunye). Lezi izinombolo zangaphakathi ze-StepFun.

Amakhono we-Multimodal

Isinyathelo 3.7 I-Flash isekela izindlela ezimbili zamathuluzi abonakalayo:

Ithuluzi Lokusesha Elibonakalayo – Emisebenzini yokuqashelwa lapho ulwazi lwepharamitha yemodeli lunganele (amabhizinisi anomsila omude, imiqondo esanda kuvela), icela ithuluzi lokusesha elibonakalayo ukuze libuyise futhi liqinisekise. Vuliwe I-SimpleVQA (ngokusesha)ifaka amaphuzu 79.16%uma kuqhathaniswa ne-GPT 5.5 (79.11%) nangaphezulu kwe-Kimi K2.6 (78.24%) kanye ne-GLM 5V Turbo (78.20%).

Ithuluzi lePython – Ngemisebenzi yokubonwayo enohlamvu oluhle (izithombe ezinokulungiswa okuphezulu, ukuhlola okubukwayo, ukuhlaziywa kwebhokisi elibophezelayo), isebenzisa uxhumano lwekhodi ukunqampuna, ukusondeza, nokudweba amaphikseli noma amabhokisi okubopha. Vuliwe V (isikolo esizihlolele ngePython), siyathola amaphuzu 95.29%. Vuliwe I-HR-Bench 4K futhi I-HR-Bench 8Kifaka amaphuzu 89.13% futhi 86.34% ngokulandelana.

I-StepFun iphawula ukuziphatha okuqashelwe phakathi nokuhlolwa: imodeli ihlanganise amathuluzi abonakalayo namathuluzi angabonwa ngaphandle kokuqeqeshelwa ukwenza kanjalo. Isibonelo, ngemuva kokukhiqiza ikhodi ye-frontend, isebenzise i-GUI ukuze inikeze futhi ihlole umphumela ngaphambi kokuphindaphinda. I-StepFun ichaza lokhu njengokusetshenziswa kwethuluzi lokuqamba eliphuthumayo.

Vuliwe Android Nsuku zonke (ukuqedwa komsebenzi we-UI yefoni ende emkhathizwe), Isinyathelo 3.7 izikolo ze-Flash 61.87%ngaphambi kukaKimi K2.6 (53.36%) kanye ne-GLM 5V Turbo (51.68%). I-Gemini 3 Flash (63.21%) ihola leli bhentshimakhi.

Sesha Nocwaningo Benchmarks

I-StepFun igxile ekwakhiweni kokusesha kwale modeli ekuhleleni, ekuhlungeni ubufakazi, nasekuhlanganiseni ukusesha njengengxenye ye-loop yokucabanga esikhundleni sesengezo esihlukile.

Ibhentshimakhi Isinyathelo 3.7 Flash Ukuqhathanisa okuphawulekayo
I-HLE enamathuluzi (acc) 47.20% I-DeepSeek V4 Flash: 45.10%
BrowseComp (acc) 75.82% Claude Opus 4.7: 79.30%
I-DeepSearchQA (F1) 92.82% UKimi K2.6: 92.50%
ResearchRubrics (amaphuzu) 71.68% I-GPT 5.5: 61.50%

Qaphela: Isikolo se-HLE enamathuluzi esingu-47.20% siqhathaniswa nesinyathelo 3.5 sombhalo kuphela wesikolo se-Flash esingu-35.68%. Isinyathelo 3.5 I-Flash ayizange ikusekele ukuhlola okuthuthukisiwe kwethuluzi ku-HLE.

Amabhentshimakhi e-ejenti evamile

Ibhentshimakhi Isinyathelo 3.7 Flash Incazelo
I-Toolathlon 49.51% Ukuxhumana kwamathuluzi amaningi
I-ClawEval-1.1 67.07% Ukwenziwa komsebenzi ozimele wansuku zonke ezindaweni ezingokoqobo
I-GDPval (imisebenzi engama-44) 45.8% Ukwenziwa komsebenzi ochwepheshe ojwayelekile
I-Tau2-bench Telecom >98% Kuzo zonke izigaba zobunzima bokucabanga ezihlukene

Ku-ClawEval-1.1, Isinyathelo 3.7 Flash (67.07%) ihola i-DeepSeek V4 Flash (57.80%) ne-DeepSeek V4 Pro (59.80%) phakathi kwamamodeli aqhathaniswayo.

Ukusebenza Kokuqukethwe Okude

Vuliwe I-AA-LCR (ibhentshimakhi yokubuyiswa yomongo omude, avg@16/acc), Isinyathelo 3.7 Izikolo ze-Flash 63.94%. Lokhu kuqhathaniswa ne-DeepSeek V4 Flash (63.70%) kanye ne-DeepSeek V4 Pro (66.30%).

Intengo

Uhlobo Lwethokheni Inani
Okokufaka (inqolobane ayiphuthelwa) $0.20 / M amathokheni
Okokufaka (i-cache hit) $0.04 / M amathokheni
Okukhiphayo $1.15 / M amathokheni

Isichazi Esibonakalayo sikaMarktechpost

Isilayidi 1 kwezi-8 — Uhlolojikelele

Siyini Isinyathelo 3.7 Flash?

Isinyathelo 3.7 I-Flash iyingcosana Ingxube-yezazi (MoE) imodeli yolimi lombono evela ku-StepFun. Ihlanganisa umgogodla wolimi wepharamitha engu-196B nesishumeki sepharamitha engu-1.8B Vision Transformer (ViT) ukuze siqonde isithombe somdabu.

Kumodeli ye-MoE, isethi encane kuphela yamanethiwekhi angaphansi “ochwepheshe” asebenza ngethokheni ngayinye – hhayi inethiwekhi egcwele. Lokhu kugcina i-inference compute eduze nemodeli eminyene engu-11B kuyilapho kugcinwa amapharamitha angu-198B aphelele.

Iwindi Lokuqukethwe

256k amathokheni

Amazinga Okucabanga

Phansi / Med / Phezulu

Isilayidi 2 kwezi-8 – Izakhiwo

Amanothi Ezakhiwo

Isishumeki se-1.8B ViT sisebenza njenge-a module ehlukene futhi ifaka izethulo zezithombe kumongo womgogodla wolimi. Isinyathelo 3.5 I-Flash bekuwumbhalo kuphela; Ukusekelwa kwe-multimodal komdabu kusha ku-3.7.

Ukujula kokucabanga okuthathu vumela onjiniyela balinganise isivinini nezindleko:

  • Phansi — Eshesha kakhulu, eshibhe kakhulu. Ifanele ukuqedelwa okulula.
  • Maphakathi – Izindleko ezilinganiselayo nokujula kokucabanga.
  • Phezulu – Ukubala okuningi ngempendulo ngayinye. Kuhle kakhulu emisebenzini eyinkimbinkimbi ye-ejenti.

Umzila we-MoE usho ukuthi ukhokhela ~ 11B amapharamitha asebenzayo ngokuqonda, hhayi u-198B. Lokhu ukuhwebelana okuyisisekelo kokusebenza kahle kumamodeli we-Flash-tier.

Isilayidi sesi-3 kwezi-8 — Ukubhala ngekhodi kwe-Agentic

Ukusebenza Kwekhodi Ye-Agentic

Isinyathelo 3.7 Amaphuzu we-Flash 56.26% ku-SWE-Bench Pro (kusuka ku-51.3% ku-3.5 Flash) kanye 59.55% ku-Terminal-Bench 2.1 (isuka ku-53.37%). Ku-SWE-MTLG ithola amaphuzu 72.42%.

Imiklomelo ye-harness ngayinye ku-StepFun yangaphakathi ye-Step-SWE-Bench:

Isikafula 3.7 Flash 3.5 I-Flash
Umenzeli weHermes 67.5% 60.0%
I-OpenClaw 67.0% 47.0%
I-KiloCode 67.5% 59.0%
I-RooCode 64.5% 43.0%
Claude Ikhodi 71.5% 73.0%
I-OpenCode 64.5% 57.0%

3.5 I-Flash isukela ku-43–73% kuwo wonke amahhanisi. 3.7 Ifleshi incipha ifike ku-64.5–71.5% — ibikezeleka kakhulu kuzo zonke izikafula ezihlukile.

Isilayidi sesi-4 kwezi-8 — Imodi yomeluleki

Imodi yomeluleki

Isinyathelo 3.7 Flash isekela Imodi yomelulekiUkuqaliswa kwe-StepFun kwesu lomeluleki elichazwe yi-Anthropic. Imodeli isebenzisa iluphu ye-agent egcwele – amathuluzi okushaya ucingo, imiphumela yokufunda, ukuphindaphinda – futhi ikhuphukela kumodeli yomeluleki enkulu kuphela ezindaweni ezithile zokuguquguquka.

  • Iyakhula ngesikhathi ukuhlela noma ukululama ekuhlulekeni okuphindaphindiwe
  • Iningi lokugijima lihlala ezindlekweni ze-executor (Flash).
  • Imodeli enkulu yomeluleki ithintwa kancane

Imiphumela eqinisekisiwe ye-SWE-Bench ene-Advisor Mode (izibalo zangaphakathi ze-StepFun):

Isinyathelo 3.7 Flash + Advisor

76.3% amaphuzu

UClaude Opus 4.6

78.7% amaphuzu

Claude Opus 4.6 izindleko

$1.76

Isilayidi sesi-5 kwezi-8 — I-Multimodal

Amakhono we-Multimodal

Isinyathelo 3.7 I-Flash isekela izindlela ezimbili zamathuluzi abonakalayo:

  • Ithuluzi Lokusesha Elibonakalayo — Kucelwe ukunakwa kwebhizinisi elinomsila omude noma imiqondo esanda kuvela lapho ulwazi lwepharamitha lunganele. I-SimpleVQA (Sesha): 79.16%
  • Ithuluzi lePython – Ikhodi yokuxhumana yokunqampuna, ukusondeza, imisebenzi ye-pixel/ibhokisi elibophayo ezithombeni ezinokulungiswa okuphezulu. V* (Python): 95.29% | I-HR-Bench 4K: 89.13% | I-HR-Bench 8K: 86.34%

Android Nsuku zonke (imisebenzi ye-UI yocingo olude lomkhathizwe): Isinyathelo 3.7 Izikolo ze-Flash 61.87%ngaphambi kukaKimi K2.6 (53.36%) kanye ne-GLM 5V Turbo (51.68%). I-Gemini 3 Flash ihamba phambili ngo-63.21%.

I-StepFun ibika ukusetshenziswa kwethuluzi lokuqamba eliphuthumayo phakathi nokuhlolwa — imodeli ihlanganise amathuluzi abonakalayo nangabonwa ngaphandle kokuqeqeshwa okusobala ukwenza kanjalo.

Isilayidi 6 kwezingu-8 — Sesha & Ucwaningo

Sesha Nocwaningo Benchmarks

Ukusesha kuhlanganiswe ku-loop yokucabanga yemodeli esikhundleni sokuphathwa njengesengezo sangaphandle. I-StepFun igxile ekuqeqesheni ukusesha, ukuhlunga ubufakazi, kanye nokuhlanganisa.

Ibhentshimakhi 3.7 Flash Ukuqhathanisa
HLE w. Amathuluzi (i-acc) 47.20% I-DeepSeek V4 Flash: 45.10%
BrowseComp (acc) 75.82% Claude Opus 4.7: 79.30%
I-DeepSearchQA (F1) 92.82% UKimi K2.6: 92.50%
ResearchRubrics 71.68% I-GPT 5.5: 61.50%

Ukuqhathanisa kwe-HLE: Isinyathelo 3.5 I-Flash ithole u-35.68% kumbhalo kuphela. Isinyathelo 3.7 I-Flash ithola amaphuzu angu-47.20% ngokufinyelela ithuluzi — lawa akuwona ama-apula ukuya kuma-apula.

Isilayidi 7 kwezingu-8 – Ukuthunyelwa

Intengo, Ukuthunyelwa kanye ne-Ecosystem

Uhlobo Lwethokheni Inani
Okokufaka (inqolobane ayiphuthelwa) $0.20 / M amathokheni
Okokufaka (i-cache hit) $0.04 / M amathokheni
Okukhiphayo $1.15 / M amathokheni

Itholakala ngo:

I-StepFun Platform
I-OpenRouter
I-NVIDIA NIM
I-DeepInfra (maduze)
Iziqhumane AI (maduze)
I-Modal (maduze)

I-Inference backends: I-vLLM, i-SGlang, i-Hugging Face Transformers (idinga i-v5.0+), llama.cpp

Amafomethi we-quantization: BF16, FP8, NVFP4, GGUF

Ubuncane bendawo: 120 GB inkumbulo ehlanganisiwe/VRAM

Isilayidi 8 kwezingu-8 — Izinto Ezibalulekile Ezithathwayo

Okuthathwayo Okubalulekile

  • 198B imodeli ye-MoE egqagqene enamapharam asebenzayo angu-~11B ithokheni ngayinye kanye newindi lomongo elingu-256k
  • Ukwesekwa kwezinhlobo eziningi zomdabu (izithombe, ama-GUI, amadokhumenti) — Isinyathelo 3.5 I-Flash bekuwumbhalo kuphela
  • Imodi Yomeluleki ithola u-76.3% ku-SWE-Bench Iqinisekisiwe ngo-$0.19/ngomsebenzi uma iqhudelana no-Claude Opus 4.6 ngo-$1.76
  • Ukwehluka kwekhodi yezintambo ezinqamulayo kunciphile ukusuka ku-43–73% (3.5) kuya ku-64.5–71.5% (3.7)
  • Ikhishwe i-Apache 2.0 enezisindo ze-BF16, FP8, NVFP4, kanye ne-GGUF ku-Hugging Face

Amahhanisi ahambisanayo:

Claude Ikhodi
I-KiloCode
Umenzeli weHermes
I-OpenClaw

Okuthathwayo Okubalulekile

  • Isinyathelo 3.7 I-Flash iyimodeli ye-198B egqagqene ye-MoE enamapharamu asebenzayo angu-11B kanye newindi lokuqukethwe elingu-256k.
  • Ukusekelwa kwezindlela eziningi zomdabu (izithombe, ama-GUI, amadokhumenti) kusha — Isinyathelo 3.5 I-Flash ibingombhalo kuphela.
  • Imodi Yomeluleki ifinyelela ku-97% wokusebenza Okuqinisekisiwe kwe-SWE-Bench ka-Claude Opus 4.6 ngo-$0.19 ngomsebenzi ngamunye vs. $1.76.
  • Ukwehluka kwekhodi yezintambo kuncishiswe kusuka ku-43–73% ububanzi (3.5 Flash) kuya ku-64.5–71.5% (3.7 Flash).
  • Ikhishwe ngaphansi kwe-Apache 2.0 ene-BF16, FP8, NVFP4, kanye nezisindo ze-GGUF ku-Hugging Face.

Hlola Izisindo zemodeli, I-Repo futhi Imininingwane Yezobuchwepheshe. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.

Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhuma nathi


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button