I-StepFun Ikhipha Isinyathelo 3.7 Flash: Imodeli Yombono Yolimi ye-198B MoE Yama-ejenti Okubhala Amakhodi Nokugeleza Kokusebenza Kokusesha

I-StepFun namuhla ikhiphe Isinyathelo 3.7 Flash, imodeli ye-multimodal Mixture-of-Experts eqondise amacala okusetshenziswa kwe-ejenti. Yengeza okokufaka kombono komdabu kanye nokwethenjwa kokusetshenziswa kwamathuluzi okuthuthukisiwe ngesinyathelo 3.5 Flash.
Siyini Isinyathelo 3.7 Flash?
Isinyathelo 3.7 Flash iyi-a 198B-parameter sparse Mixture-of-Experts (MoE) imodeli yolimi lombono. Ibhanqa a 196B-ipharamitha yomgogodla wolimi nge 1.8B-ipharamitha yokufaka ikhodi yombono (ViT) ukuze uthole ukuqonda kwesithombe somdabu.
Imodeli isebenza cishe 11B amapharamitha ithokheni ngayinye ngesikhathi sokukhulelwa. Ezakhiweni ze-MoE, isethi engaphansi kuphela “yochwepheshe” bamanethiwekhi angaphansi avuthayo ngokudlula okuya phambili – hhayi inethiwekhi egcwele. Lokhu kugcina i-inference compute iseduze nemodeli eminyene engu-11B kuyilapho kugcinwa isabelomali sepharamitha esiphelele esingu-198B.
Imininingwane ebalulekile:
| Ukucaciswa | Inani |
|---|---|
| Ingqikithi yamapharamitha | 198B (196B ulimi + 1.8B ViT) |
| Amapharamitha asebenzayo ngethokheni ngayinye | ~11B |
| Iwindi lokuqukethwe | 256k amathokheni |
| Okokusebenza | Kufika kumathokheni angu-400/isekhondi |
| Amazinga okucabanga | Phansi, medium, high |
| Ilayisensi | I-Apache 2.0 |
Amanothi Ezakhiwo
Isifaki khodi sombono sisebenza njengemojuli ehlukile ye-1.8B ViT. Ifaka izethulo zesithombe kumongo womgogodla wolimi. Isinyathelo 3.5 I-Flash yayingenakho ukwesekwa kwe-multimodal; lokhu isengezo esisha ku-3.7.
Ukujula okukhethekayo okuthathu – okuphansi, okumaphakathi, nokuphezulu – kuvumela onjiniyela bahwebe ngokubambezeleka ngokujula kokucabanga. Okuphansi kuyashesha futhi kushibhile; okuphezulu kusebenza ukubala okwengeziwe ngempendulo ngayinye.
Ukusebenza Kwekhodi Ye-Agentic
Vuliwe I-SWE-Bench ProIsinyathelo 3.7 amaphuzu Flash 56.26%ikhuphuke isuka ku-Isinyathelo 3.5 Flash sika-51.3% — inzuzo ecishe ibe ngamaphesenti angu-5. Vuliwe I-Terminal-Bench 2.1ifaka amaphuzu 59.55%inyuke isuka ku-53.37%.
Vuliwe I-SWE-MTLG (ibhentshimakhi yokubhala ikhodi yesizukulwane eside enemisebenzi eminingi), iyazuza 72.42%.
Ukuhambisana kwe-cross-harness ku-StepFun yangaphakathi Isinyathelo-SWE-Ibhentshi:
| Isikafula | Isinyathelo 3.7 Flash | Isinyathelo 3.5 Flash |
|---|---|---|
| Umenzeli weHermes | 67.5% | 60.0% |
| I-OpenClaw | 67.0% | 47.0% |
| I-KiloCode | 67.5% | 59.0% |
| I-RooCode | 64.5% | 43.0% |
| Claude Ikhodi | 71.5% | 73.0% |
| I-OpenCode | 64.5% | 57.0% |
Isinyathelo 3.5 I-Flash isukela ku-43% kuya ku-73% kuwo wonke amahhanisi. Isinyathelo 3.7 Flash isukela ku-64.5% iye ku-71.5%. Ekukhiqizeni, ama-ejenti okubhala amakhodi avame ukusebenza ngaphakathi kwezikafula ezingafani – ngayinye inezimiso zayo ezikhuthazayo nama-schema amathuluzi. Ukwehluka okuncane kwehhanisi ngalinye kusho ukuziphatha okubikezelwe kakhudlwana ekusetheni okuhlukahlukene.
Imodi yomeluleki
Isinyathelo 3.7 Flash isekela Imodi yomelulekiUkuqaliswa kwe-StepFun kwesu lomeluleki elichazwe yi-Anthropic. Imodeli isebenzisa iluphu ye-ajenti ekugcineni – amathuluzi okushaya ucingo, imiphumela yokufunda, ukuphindaphinda – futhi ikhuphukela kumodeli yomeluleki enkulu kuphela ezindaweni ezithile zokuguquguquka, njengokuhlela noma ukululama ekuhlulekeni okuphindaphindiwe. Ukugijima okuningi kuhlala kuzindleko zamabi.
Nge-Advisor Mode enikwe amandla ku-SWE-Bench Verified, i-StepFun ibika Isinyathelo 3.7 I-Flash ifinyelela U-97% wokusebenza kwekhodi kuka-Claude Opus 4.6 cishe endaweni eyodwa-yesishiyagalolunye yezindleko zomsebenzi ngamunye ($0.19 vs. $1.76 ngomsebenzi ngamunye). Lezi izinombolo zangaphakathi ze-StepFun.
Amakhono we-Multimodal
Isinyathelo 3.7 I-Flash isekela izindlela ezimbili zamathuluzi abonakalayo:
Ithuluzi Lokusesha Elibonakalayo – Emisebenzini yokuqashelwa lapho ulwazi lwepharamitha yemodeli lunganele (amabhizinisi anomsila omude, imiqondo esanda kuvela), icela ithuluzi lokusesha elibonakalayo ukuze libuyise futhi liqinisekise. Vuliwe I-SimpleVQA (ngokusesha)ifaka amaphuzu 79.16%uma kuqhathaniswa ne-GPT 5.5 (79.11%) nangaphezulu kwe-Kimi K2.6 (78.24%) kanye ne-GLM 5V Turbo (78.20%).
Ithuluzi lePython – Ngemisebenzi yokubonwayo enohlamvu oluhle (izithombe ezinokulungiswa okuphezulu, ukuhlola okubukwayo, ukuhlaziywa kwebhokisi elibophezelayo), isebenzisa uxhumano lwekhodi ukunqampuna, ukusondeza, nokudweba amaphikseli noma amabhokisi okubopha. Vuliwe V (isikolo esizihlolele ngePython), siyathola amaphuzu 95.29%. Vuliwe I-HR-Bench 4K futhi I-HR-Bench 8Kifaka amaphuzu 89.13% futhi 86.34% ngokulandelana.
I-StepFun iphawula ukuziphatha okuqashelwe phakathi nokuhlolwa: imodeli ihlanganise amathuluzi abonakalayo namathuluzi angabonwa ngaphandle kokuqeqeshelwa ukwenza kanjalo. Isibonelo, ngemuva kokukhiqiza ikhodi ye-frontend, isebenzise i-GUI ukuze inikeze futhi ihlole umphumela ngaphambi kokuphindaphinda. I-StepFun ichaza lokhu njengokusetshenziswa kwethuluzi lokuqamba eliphuthumayo.
Vuliwe Android Nsuku zonke (ukuqedwa komsebenzi we-UI yefoni ende emkhathizwe), Isinyathelo 3.7 izikolo ze-Flash 61.87%ngaphambi kukaKimi K2.6 (53.36%) kanye ne-GLM 5V Turbo (51.68%). I-Gemini 3 Flash (63.21%) ihola leli bhentshimakhi.
Sesha Nocwaningo Benchmarks
I-StepFun igxile ekwakhiweni kokusesha kwale modeli ekuhleleni, ekuhlungeni ubufakazi, nasekuhlanganiseni ukusesha njengengxenye ye-loop yokucabanga esikhundleni sesengezo esihlukile.
| Ibhentshimakhi | Isinyathelo 3.7 Flash | Ukuqhathanisa okuphawulekayo |
|---|---|---|
| I-HLE enamathuluzi (acc) | 47.20% | I-DeepSeek V4 Flash: 45.10% |
| BrowseComp (acc) | 75.82% | Claude Opus 4.7: 79.30% |
| I-DeepSearchQA (F1) | 92.82% | UKimi K2.6: 92.50% |
| ResearchRubrics (amaphuzu) | 71.68% | I-GPT 5.5: 61.50% |
Qaphela: Isikolo se-HLE enamathuluzi esingu-47.20% siqhathaniswa nesinyathelo 3.5 sombhalo kuphela wesikolo se-Flash esingu-35.68%. Isinyathelo 3.5 I-Flash ayizange ikusekele ukuhlola okuthuthukisiwe kwethuluzi ku-HLE.
Amabhentshimakhi e-ejenti evamile
| Ibhentshimakhi | Isinyathelo 3.7 Flash | Incazelo |
|---|---|---|
| I-Toolathlon | 49.51% | Ukuxhumana kwamathuluzi amaningi |
| I-ClawEval-1.1 | 67.07% | Ukwenziwa komsebenzi ozimele wansuku zonke ezindaweni ezingokoqobo |
| I-GDPval (imisebenzi engama-44) | 45.8% | Ukwenziwa komsebenzi ochwepheshe ojwayelekile |
| I-Tau2-bench Telecom | >98% | Kuzo zonke izigaba zobunzima bokucabanga ezihlukene |
Ku-ClawEval-1.1, Isinyathelo 3.7 Flash (67.07%) ihola i-DeepSeek V4 Flash (57.80%) ne-DeepSeek V4 Pro (59.80%) phakathi kwamamodeli aqhathaniswayo.
Ukusebenza Kokuqukethwe Okude
Vuliwe I-AA-LCR (ibhentshimakhi yokubuyiswa yomongo omude, avg@16/acc), Isinyathelo 3.7 Izikolo ze-Flash 63.94%. Lokhu kuqhathaniswa ne-DeepSeek V4 Flash (63.70%) kanye ne-DeepSeek V4 Pro (66.30%).
Intengo
| Uhlobo Lwethokheni | Inani |
|---|---|
| Okokufaka (inqolobane ayiphuthelwa) | $0.20 / M amathokheni |
| Okokufaka (i-cache hit) | $0.04 / M amathokheni |
| Okukhiphayo | $1.15 / M amathokheni |
Isichazi Esibonakalayo sikaMarktechpost
Okuthathwayo Okubalulekile
- Isinyathelo 3.7 I-Flash iyimodeli ye-198B egqagqene ye-MoE enamapharamu asebenzayo angu-11B kanye newindi lokuqukethwe elingu-256k.
- Ukusekelwa kwezindlela eziningi zomdabu (izithombe, ama-GUI, amadokhumenti) kusha — Isinyathelo 3.5 I-Flash ibingombhalo kuphela.
- Imodi Yomeluleki ifinyelela ku-97% wokusebenza Okuqinisekisiwe kwe-SWE-Bench ka-Claude Opus 4.6 ngo-$0.19 ngomsebenzi ngamunye vs. $1.76.
- Ukwehluka kwekhodi yezintambo kuncishiswe kusuka ku-43–73% ububanzi (3.5 Flash) kuya ku-64.5–71.5% (3.7 Flash).
- Ikhishwe ngaphansi kwe-Apache 2.0 ene-BF16, FP8, NVFP4, kanye nezisindo ze-GGUF ku-Hugging Face.
Hlola Izisindo zemodeli, I-Repo futhi Imininingwane Yezobuchwepheshe. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.
Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhuma nathi



