I-JetBrains Ikhipha I-Mellum2: Imodeli Ye-12B MoE Yemisebenzi Esheshayo, Ekhethekile Kumapayipi AI Amamodeli Amaningi

I-JetBrains ikhiphe i-Mellum2, ivula izisindo ngaphansi kwelayisensi ye-Apache 2.0. Inguqulo yokuqala ye-Mellum bekuyimodeli eminyene ye-4B egxile ekuqedeni. I-Mellum2 ilandela: imodeli yenhloso evamile ekhethekile kubunjiniyela besofthiwe. Ihlanganisa ukukhiqizwa kwekhodi nokuhlela, ukulungisa iphutha, ukucabanga ngezinyathelo eziningi, ukusetshenziswa kwamathuluzi kanye nokubiza umsebenzi, ukubhala ngekhodi kwe-ejenti, nosizo lwezinhlelo zengxoxo.
Ithimba le-JetBrains libeka i-Mellum2 “njengemodeli egxilile” – ingxenye esheshayo, ekhethekile ngaphakathi kwezinhlelo ze-AI ezinkulu, hhayi ukumiselela okuzimele kwamamodeli asemngceleni.
Izakhiwo
I-Mellum2 isebenzisa i-architecture ye-Mixture-of-Experts (MoE) enesamba sepharamitha engu-12B kanye namapharamitha asebenzayo angu-2.5B ngethokheni ngayinye. Kumamodeli we-MoE, isethi encane kuphela yamapharamitha esebenza kuthokheni ngayinye. Lapha, imodeli inochwepheshe abangama-64 futhi yenza kusebenze i-8 ithokheni ngayinye. Lokhu kugcina ikhompuyutha yethokheni ngayinye ilingana nemodeli eminyene engu-2.5B, kuyilapho isamba sokubala sepharamitha sinikeza umthamo ophakeme wokwenza okukhethekile.
Imininingwane yezakhiwo ezibalulekile:
- Izendlalelo: 28
- Usayizi ofihliwe: 2304
- Ochwepheshe be-MoE: 64 isiyonke, 8 isebenze ithokheni ngayinye
- Qaphela: I-Grouped-Query Attention (GQA) enamakhanda emibuzo angu-32 namakhanda angu-4 KV
- Ukunakwa Kwewindi Elishelelayo (SWA): Kusetshenziswa izendlalelo ezintathu kwezine, ngosayizi wewindi ongu-1,024. Ukunaka okugcwele kugijima kungqimba olusele.
- Ubude bomongo: 131,072 amathokheni
- Inhloko ye-Multi-Token Prediction (MTP): Isebenza njengenhloso yosizo yokuqeqeshwa kwangaphambilini nanjengemodeli esakhelwe ngaphakathi yokuqopha okucatshangelwayo
- Ukunemba: i-bfloat16
- Usayizi wesilulumagama: 98,304
Imodeli iphatha ulimi lwemvelo kanye nekhodi. Akuyona i-multimodal — asikho isithombe noma okokufaka kwevidiyo.
Pre-Training
Ukuqeqeshwa kwangaphambi kwesikhathi kufinyelela kumathokheni angama-trillion ayi-10.6 ngokusebenzisa ikharikhulamu yezigaba ezintathu. Ingxube yedatha ishintsha kancane kancane isuka kokuqukethwe kwewebhu okuhlukahlukene iye kukhodi ekhethiwe kanye nokuqukethwe kwezibalo kuzo zonke izigaba ezintathu.
Ukuqeqeshwa kusebenzise i-Muon optimizer ngaphansi kokunemba kwe-FP8 hybrid ngeshejuli yezinga lokufunda le-Warmup-Hold-Decay elinokubola komugqa ukuya kuziro.
Ngemva kokuqeqeshwa kwangaphambili, iwindi lomongo wemodeli yesisekelo lanwetshwa laya kumathokheni angu-128K kusetshenziswa indlela ye-YaRN ekhetha ungqimba ngaphambi kokuthi kuqale ukuqeqeshwa kwangemva kokuqeqeshwa.
Umndeni Wesibonelo
Ithimba le-JetBrains likhiphe izindawo zokuhlola eziyisithupha ezihlanganisa ipayipi eligcwele lokuqeqesha:
| Indawo yokuhlola | Incazelo |
|---|---|
| I-Mellum2-12B-A2.5B-Base-Pretrain | Indawo yokuhlola eyisisekelo ngaphambi kwesandiso sokuqukethwe okude |
| I-Mellum2-12B-A2.5B-Isisekelo | Imodeli yesisekelo sokugcina ngemva kwesandiso somongo |
| I-Mellum2-12B-A2.5B-Yala-SFT | Indawo yokuhlola imiyalelo ecushwe kahle egadiwe |
| I-Mellum2-12B-A2.5B-Ukucabanga-SFT | Indawo yokuhlola egadiwe |
| I-Mellum2-12B-A2.5B-Yala | Imodeli yemiyalo eshunwe i-RL |
| Mellum2-12B-A2.5B-Ukucabanga | Imodeli yokucabanga elungiswe nge-RL |
Ukuqeqeshwa kwangemva kokuqeqeshwa kulandela izigaba ezimbili: ukulungisa kahle okugadiwe (SFT), bese kuqinisa ukufunda ngemiklomelo eqinisekisiwe (RLVR) ngezibalo, ukubhala ngekhodi okusebenzisekayo, ukusetshenziswa kwamathuluzi, imiyalelo elandelayo, ukucabanga, nemisebenzi yolwazi.
I Yala izimpendulo ezihlukile ngokuqondile, ngaphandle kochungechunge lwemicabango yangaphandle. Yisebenzisele imisebenzi yokubambezeleka okuphansi: izimpendulo eziqondile, ukusetshenziswa kwamathuluzi, nemiyalelo elandelayo.
I Ukucabanga okuhlukile kukhipha umkhondo wokucabanga osobala ngaphambi kwempendulo yawo yokugcina. Yisebenzisele ukulungisa amaphutha okuyinkimbinkimbi, ukuhlela okunezinyathelo eziningi, noma ukugeleza kwe-ejenti lapho ukucabanga kwesinyathelo nesinyathelo kubalulekile.
Imiphumela yeBenchmark
Zonke izinombolo ezingezansi zizibika ngokwazo ngabakwaJetBrains. Isethi yokuqhathanisa amamodeli anesisindo esivulekile ebangeni le-4B–14B.
Ukubhala ngekhodi:
| Ibhentshimakhi | Mellum2 Yala | Qwen3.5 (4B) | Qwen3.5 (9B) | Ungqongqoshe 3 (14B) | I-OLMo-3 (7B) | I-Seed-Coder (8B) |
|---|---|---|---|---|---|---|
| I-LiveCodeBench v6 | 37.2 | 51.0 | 63.7 | 42.4 | 28.2 | 28.1 |
| EvalPlus | 78.4 | 69.4 | 71.8 | 74.1 | 67.3 | 73.8 |
| I-MultiPL-E | 67.1 | 51.0 | 67.1 | 71.5 | 36.1 | 77.0 |
Ukusetshenziswa Kwethuluzi:
| Ibhentshimakhi | Mellum2 Yala | Qwen3.5 (4B) | Qwen3.5 (9B) | Ungqongqoshe 3 (14B) | I-OLMo-3 (7B) |
|---|---|---|---|---|---|
| I-BFCL v3 | 66.3 | 64.1 | 70.5 | 52.7 | 41.9 |
| I-BFCL v4 | 44.2 | 52.0 | 60.6 | 38.8 | 19.8 |
Izibalo:
| Ibhentshimakhi | Mellum2 Yala | Qwen3.5 (4B) | Qwen3.5 (9B) | Ungqongqoshe 3 (14B) | I-OLMo-3 (7B) |
|---|---|---|---|---|---|
| I-AIME 2025+2026 | 41.7 | 38.3 | 58.3 | 33.3 | 40.0 |
| I-GSM-Plus | 80.5 | 85.2 | 87.9 | 86.6 | 85.8 |
Ulwazi Nezingxoxo:
| Ibhentshimakhi | Mellum2 Yala | Qwen3.5 (4B) | Qwen3.5 (9B) | Ungqongqoshe 3 (14B) | I-OLMo-3 (7B) |
|---|---|---|---|---|---|
| I-MMLU-Redux | 78.1 | 87.5 | 91.1 | 85.9 | 71.8 |
| GPQA Diamond | 40.9 | 76.8 | 79.8 | 58.6 | 40.9 |
| IFEval | 75.8 | 82.1 | 83.9 | 67.3 | 83.2 |
| I-MixEval | 62.2 | 65.9 | 71.1 | 71.2 | 59.4 |
Amanothi e-Benchmark:
- I-EvalPlus isho i-HumanEval+ ne-MBPP+
- I-AIME isho ukuthi i-AIME 2025 kanye ne-AIME 2026 (imibuzo engu-30 ngayinye)
- I-BFCL v4 iyi-macro-avareji yemisebenzi emincane emihlanu: i-v1, i-v2, i-v3, ukusesha kwewebhu, inkumbulo
- I-Seed-Coder (8B) ayisekeli ukushaya kwethuluzi lomdabu; Izikolo ze-BFCL azifakwanga ohlwini lwayo

Sebenzisa Amacala
I-JetBrains ihlonza izimo ezine zokukhiqiza lapho ukubambezeleka kwe-Mellum2 kanye nephrofayili esebenzayo ibalulekile:
- Umzila kanye ne-orchestration: Kusistimu yamamodeli amaningi, irutha ihlaziya imiyalo engenayo bese ikhetha imodeli efanele noma ithuluzi lomsebenzi ngamunye. Ikhompuyutha ephansi ye-Mellum2 yethokheni ngayinye iyenza ifanele lesi sinyathelo sokuhlukanisa imvamisa ephezulu.
- Amapayipi e-RAG abambezeleke kancane: Amasistimu e-Retrieval-Augmented Generation (RAG) athola umongo ofanele, afingqe, futhi enze impendulo. I-Mellum2 iphatha ukufinyezwa kokubuyiswa ngokubambezeleka okuphansi kunamamodeli amakhulu aminyene.
- Ama-sub-ejenti ekuhambeni komsebenzi okuyinkimbinkimbi: Amapayipi omenzeli ahlukanisa imisebenzi ibe izinyathelo: ukuqoqwa kokuqukethwe, ukuhlela, ukuqinisekiswa, nokwenza. I-Mellum2 ingakwazi ukusingatha izinyathelo eziphindaphindwayo noma ezizwelayo ukubambezeleka esikhundleni sokuqondisa isinyathelo ngasinye ngemodeli eyodwa enkulu yomngcele.
- Ukuthunyelwa kwangasese nasendaweni: Ilayisensi ye-Apache 2.0 ivumela ukusingathwa ngaphandle kwemikhawulo. Onjiniyela bangasebenzisa i-Mellum2 engqalasizinda yabo, begcina ikhodi nedatha ngaphansi kolawulo lwabo.
Amandla Nemikhawulo
Amandla:
- Idizayini ye-MoE yenza kusebenze u-2.5B kuphela wamapharamitha angu-12B ithokheni ngayinye — ikhomputha yethokheni ngayinye elingana nemodeli eminyene engu-2.5B
- Ikhanda le-MTP linika amandla ukuqopha okucatshangelwayo ngaphandle kwemodeli yokusalungiswa ehlukile
- 131,072 iwindi lomongo wamathokheni
- Isethi yokuhlola egcwele ikhululiwe: i-base pretrain, isisekelo, i-SFT, ne-RL-tuned okuhlukile kwakho kokubili i-Instruction and Thinking
- Ilayisense ye-Apache 2.0 – ivumela ukusetshenziswa kwezohwebo, ukuzibamba ngokwakho, nokulungisa kahle
- I-Strong EvalPlus (78.4) ne-BFCL v3 (66.3) ithola amaphuzu ahlobene nokuqhathanisa okungu-4B–14B
- Ukwesekwa kwe-vLLM, okuhlanganisa ukushaya kwamathuluzi ongakukhetha nge
--tool-call-parser hermes
Imikhawulo:
- Umbhalo nekhodi kuphela — asikho isithombe noma okokufaka kwe-multimodal
- I-LiveCodeBench v6 (37.2) ihamba nge-Qwen3.5 9B (63.7) kanye ne-Ministral 3 14B (42.4)
- I-GPQA Diamond (40.9) ne-MMLU-Redux (78.1) zingaphansi kwamamodeli amaningi kusethi yokuqhathanisa.
- I-GSM-Plus (80.5) ingaphansi kwawo wonke amamodeli aqhathanisekayo asohlwini
- Ayiklanyelwe imisebenzi yezinga lomngcele – I-JetBrains ibeka i-Mellum2 ngokusobala njengemodeli yengxenye
Isichazi Esibonakalayo sikaMarktechpost
Ukuqalisa
Khonza i-Mellum2 nge-vLLM:
pip install vllm
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct --max-model-len 131072
Ngokushaya ithuluzi kunikwe amandla:
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct
--max-model-len 131072
--enable-auto-tool-choice
--tool-call-parser hermes
Ukusebenzisa ilabhulali ye-Hugging Face Transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Instruct")
messages = [{"role": "user", "content": "Write a Python function to reverse a string."}]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Hlola Izisindo zemodeli futhi Imininingwane yobuchwepheshe. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.
Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhuma nathi



