Generative AI

I-JetBrains Ikhipha I-Mellum2: Imodeli Ye-12B MoE Yemisebenzi Esheshayo, Ekhethekile Kumapayipi AI Amamodeli Amaningi

I-JetBrains ikhiphe i-Mellum2, ivula izisindo ngaphansi kwelayisensi ye-Apache 2.0. Inguqulo yokuqala ye-Mellum bekuyimodeli eminyene ye-4B egxile ekuqedeni. I-Mellum2 ilandela: imodeli yenhloso evamile ekhethekile kubunjiniyela besofthiwe. Ihlanganisa ukukhiqizwa kwekhodi nokuhlela, ukulungisa iphutha, ukucabanga ngezinyathelo eziningi, ukusetshenziswa kwamathuluzi kanye nokubiza umsebenzi, ukubhala ngekhodi kwe-ejenti, nosizo lwezinhlelo zengxoxo.

Ithimba le-JetBrains libeka i-Mellum2 “njengemodeli egxilile” – ingxenye esheshayo, ekhethekile ngaphakathi kwezinhlelo ze-AI ezinkulu, hhayi ukumiselela okuzimele kwamamodeli asemngceleni.

Izakhiwo

I-Mellum2 isebenzisa i-architecture ye-Mixture-of-Experts (MoE) enesamba sepharamitha engu-12B kanye namapharamitha asebenzayo angu-2.5B ngethokheni ngayinye. Kumamodeli we-MoE, isethi encane kuphela yamapharamitha esebenza kuthokheni ngayinye. Lapha, imodeli inochwepheshe abangama-64 futhi yenza kusebenze i-8 ithokheni ngayinye. Lokhu kugcina ikhompuyutha yethokheni ngayinye ilingana nemodeli eminyene engu-2.5B, kuyilapho isamba sokubala sepharamitha sinikeza umthamo ophakeme wokwenza okukhethekile.

Imininingwane yezakhiwo ezibalulekile:

  • Izendlalelo: 28
  • Usayizi ofihliwe: 2304
  • Ochwepheshe be-MoE: 64 isiyonke, 8 isebenze ithokheni ngayinye
  • Qaphela: I-Grouped-Query Attention (GQA) enamakhanda emibuzo angu-32 namakhanda angu-4 KV
  • Ukunakwa Kwewindi Elishelelayo (SWA): Kusetshenziswa izendlalelo ezintathu kwezine, ngosayizi wewindi ongu-1,024. Ukunaka okugcwele kugijima kungqimba olusele.
  • Ubude bomongo: 131,072 amathokheni
  • Inhloko ye-Multi-Token Prediction (MTP): Isebenza njengenhloso yosizo yokuqeqeshwa kwangaphambilini nanjengemodeli esakhelwe ngaphakathi yokuqopha okucatshangelwayo
  • Ukunemba: i-bfloat16
  • Usayizi wesilulumagama: 98,304

Imodeli iphatha ulimi lwemvelo kanye nekhodi. Akuyona i-multimodal — asikho isithombe noma okokufaka kwevidiyo.

Pre-Training

Ukuqeqeshwa kwangaphambi kwesikhathi kufinyelela kumathokheni angama-trillion ayi-10.6 ngokusebenzisa ikharikhulamu yezigaba ezintathu. Ingxube yedatha ishintsha kancane kancane isuka kokuqukethwe kwewebhu okuhlukahlukene iye kukhodi ekhethiwe kanye nokuqukethwe kwezibalo kuzo zonke izigaba ezintathu.

Ukuqeqeshwa kusebenzise i-Muon optimizer ngaphansi kokunemba kwe-FP8 hybrid ngeshejuli yezinga lokufunda le-Warmup-Hold-Decay elinokubola komugqa ukuya kuziro.

Ngemva kokuqeqeshwa kwangaphambili, iwindi lomongo wemodeli yesisekelo lanwetshwa laya kumathokheni angu-128K kusetshenziswa indlela ye-YaRN ekhetha ungqimba ngaphambi kokuthi kuqale ukuqeqeshwa kwangemva kokuqeqeshwa.

Umndeni Wesibonelo

Ithimba le-JetBrains likhiphe izindawo zokuhlola eziyisithupha ezihlanganisa ipayipi eligcwele lokuqeqesha:

Indawo yokuhlola Incazelo
I-Mellum2-12B-A2.5B-Base-Pretrain Indawo yokuhlola eyisisekelo ngaphambi kwesandiso sokuqukethwe okude
I-Mellum2-12B-A2.5B-Isisekelo Imodeli yesisekelo sokugcina ngemva kwesandiso somongo
I-Mellum2-12B-A2.5B-Yala-SFT Indawo yokuhlola imiyalelo ecushwe kahle egadiwe
I-Mellum2-12B-A2.5B-Ukucabanga-SFT Indawo yokuhlola egadiwe
I-Mellum2-12B-A2.5B-Yala Imodeli yemiyalo eshunwe i-RL
Mellum2-12B-A2.5B-Ukucabanga Imodeli yokucabanga elungiswe nge-RL

Ukuqeqeshwa kwangemva kokuqeqeshwa kulandela izigaba ezimbili: ukulungisa kahle okugadiwe (SFT), bese kuqinisa ukufunda ngemiklomelo eqinisekisiwe (RLVR) ngezibalo, ukubhala ngekhodi okusebenzisekayo, ukusetshenziswa kwamathuluzi, imiyalelo elandelayo, ukucabanga, nemisebenzi yolwazi.

I Yala izimpendulo ezihlukile ngokuqondile, ngaphandle kochungechunge lwemicabango yangaphandle. Yisebenzisele imisebenzi yokubambezeleka okuphansi: izimpendulo eziqondile, ukusetshenziswa kwamathuluzi, nemiyalelo elandelayo.

I Ukucabanga okuhlukile kukhipha umkhondo wokucabanga osobala ngaphambi kwempendulo yawo yokugcina. Yisebenzisele ukulungisa amaphutha okuyinkimbinkimbi, ukuhlela okunezinyathelo eziningi, noma ukugeleza kwe-ejenti lapho ukucabanga kwesinyathelo nesinyathelo kubalulekile.

Imiphumela yeBenchmark

Zonke izinombolo ezingezansi zizibika ngokwazo ngabakwaJetBrains. Isethi yokuqhathanisa amamodeli anesisindo esivulekile ebangeni le-4B–14B.

Ukubhala ngekhodi:

Ibhentshimakhi Mellum2 Yala Qwen3.5 (4B) Qwen3.5 (9B) Ungqongqoshe 3 (14B) I-OLMo-3 (7B) I-Seed-Coder (8B)
I-LiveCodeBench v6 37.2 51.0 63.7 42.4 28.2 28.1
EvalPlus 78.4 69.4 71.8 74.1 67.3 73.8
I-MultiPL-E 67.1 51.0 67.1 71.5 36.1 77.0

Ukusetshenziswa Kwethuluzi:

Ibhentshimakhi Mellum2 Yala Qwen3.5 (4B) Qwen3.5 (9B) Ungqongqoshe 3 (14B) I-OLMo-3 (7B)
I-BFCL v3 66.3 64.1 70.5 52.7 41.9
I-BFCL v4 44.2 52.0 60.6 38.8 19.8

Izibalo:

Ibhentshimakhi Mellum2 Yala Qwen3.5 (4B) Qwen3.5 (9B) Ungqongqoshe 3 (14B) I-OLMo-3 (7B)
I-AIME 2025+2026 41.7 38.3 58.3 33.3 40.0
I-GSM-Plus 80.5 85.2 87.9 86.6 85.8

Ulwazi Nezingxoxo:

Ibhentshimakhi Mellum2 Yala Qwen3.5 (4B) Qwen3.5 (9B) Ungqongqoshe 3 (14B) I-OLMo-3 (7B)
I-MMLU-Redux 78.1 87.5 91.1 85.9 71.8
GPQA Diamond 40.9 76.8 79.8 58.6 40.9
IFEval 75.8 82.1 83.9 67.3 83.2
I-MixEval 62.2 65.9 71.1 71.2 59.4

Amanothi e-Benchmark:

  • I-EvalPlus isho i-HumanEval+ ne-MBPP+
  • I-AIME isho ukuthi i-AIME 2025 kanye ne-AIME 2026 (imibuzo engu-30 ngayinye)
  • I-BFCL v4 iyi-macro-avareji yemisebenzi emincane emihlanu: i-v1, i-v2, i-v3, ukusesha kwewebhu, inkumbulo
  • I-Seed-Coder (8B) ayisekeli ukushaya kwethuluzi lomdabu; Izikolo ze-BFCL azifakwanga ohlwini lwayo

Sebenzisa Amacala

I-JetBrains ihlonza izimo ezine zokukhiqiza lapho ukubambezeleka kwe-Mellum2 kanye nephrofayili esebenzayo ibalulekile:

  • Umzila kanye ne-orchestration: Kusistimu yamamodeli amaningi, irutha ihlaziya imiyalo engenayo bese ikhetha imodeli efanele noma ithuluzi lomsebenzi ngamunye. Ikhompuyutha ephansi ye-Mellum2 yethokheni ngayinye iyenza ifanele lesi sinyathelo sokuhlukanisa imvamisa ephezulu.
  • Amapayipi e-RAG abambezeleke kancane: Amasistimu e-Retrieval-Augmented Generation (RAG) athola umongo ofanele, afingqe, futhi enze impendulo. I-Mellum2 iphatha ukufinyezwa kokubuyiswa ngokubambezeleka okuphansi kunamamodeli amakhulu aminyene.
  • Ama-sub-ejenti ekuhambeni komsebenzi okuyinkimbinkimbi: Amapayipi omenzeli ahlukanisa imisebenzi ibe izinyathelo: ukuqoqwa kokuqukethwe, ukuhlela, ukuqinisekiswa, nokwenza. I-Mellum2 ingakwazi ukusingatha izinyathelo eziphindaphindwayo noma ezizwelayo ukubambezeleka esikhundleni sokuqondisa isinyathelo ngasinye ngemodeli eyodwa enkulu yomngcele.
  • Ukuthunyelwa kwangasese nasendaweni: Ilayisensi ye-Apache 2.0 ivumela ukusingathwa ngaphandle kwemikhawulo. Onjiniyela bangasebenzisa i-Mellum2 engqalasizinda yabo, begcina ikhodi nedatha ngaphansi kolawulo lwabo.

Amandla Nemikhawulo

Amandla:

  • Idizayini ye-MoE yenza kusebenze u-2.5B kuphela wamapharamitha angu-12B ithokheni ngayinye — ikhomputha yethokheni ngayinye elingana nemodeli eminyene engu-2.5B
  • Ikhanda le-MTP linika amandla ukuqopha okucatshangelwayo ngaphandle kwemodeli yokusalungiswa ehlukile
  • 131,072 iwindi lomongo wamathokheni
  • Isethi yokuhlola egcwele ikhululiwe: i-base pretrain, isisekelo, i-SFT, ne-RL-tuned okuhlukile kwakho kokubili i-Instruction and Thinking
  • Ilayisense ye-Apache 2.0 – ivumela ukusetshenziswa kwezohwebo, ukuzibamba ngokwakho, nokulungisa kahle
  • I-Strong EvalPlus (78.4) ne-BFCL v3 (66.3) ithola amaphuzu ahlobene nokuqhathanisa okungu-4B–14B
  • Ukwesekwa kwe-vLLM, okuhlanganisa ukushaya kwamathuluzi ongakukhetha nge --tool-call-parser hermes

Imikhawulo:

  • Umbhalo nekhodi kuphela — asikho isithombe noma okokufaka kwe-multimodal
  • I-LiveCodeBench v6 (37.2) ihamba nge-Qwen3.5 9B (63.7) kanye ne-Ministral 3 14B (42.4)
  • I-GPQA Diamond (40.9) ne-MMLU-Redux (78.1) zingaphansi kwamamodeli amaningi kusethi yokuqhathanisa.
  • I-GSM-Plus (80.5) ingaphansi kwawo wonke amamodeli aqhathanisekayo asohlwini
  • Ayiklanyelwe imisebenzi yezinga lomngcele – I-JetBrains ibeka i-Mellum2 ngokusobala njengemodeli yengxenye

Isichazi Esibonakalayo sikaMarktechpost

Uhlolojikelele

I-JetBrains Open-Sources Mellum2

Imodeli ye-12B Mixture-of-Experts ekhishwe ngaphansi kwe-Apache 2.0 ngoJuni 2, 2026. Iqeqeshwe kusukela ekuqaleni kumathokheni angu-10.6 trillion wemisebenzi yobunjiniyela besofthiwe.

Izakhiwo

Yakhiwa kanjani iMellum2

I-MoE yenza kusebenze ochwepheshe abangu-8 kwabangu-64 ithokheni ngayinye – ikhompuyutha yethokheni ngayinye ihlala ilingana nemodeli eminyene engu-2.5B. Inhloko ye-MTP inika amandla ukuqanjwa kokuqagela ngaphandle kwemodeli yokusalungiswa ehlukile.

Ochwepheshe (inani / elisebenzayo)

64/8

Iwindi le-SWA

1,024 (¾ izendlalelo)

Pre-Training

Ipayipi Lokuqeqesha

Ikharikhulamu yezigaba ezintathu ishintsha kancane kancane isuka kudatha yewebhu eyahlukahlukene iye kukhodi ekhethiwe kanye nezibalo. Okuqukethwe kunwetshwe ku-128K nge-YaRN ekhetha isendlalelo ngaphambi kokuqeqeshwa.

  • Idatha: ~ 10.6 trillion wamathokheni ezigabeni ezintathu zekharikhulamu
  • Isilungiseleli: I-Muon ngaphansi kwe-FP8 yokunemba eyingxube
  • Ishejuli ye-LR: I-Warmup-Hold-Decay ngokubola komugqa ukuya kuqanda
  • Isandiso Sokuqukethwe: I-YaRN ekhetha isendlalelo kumathokheni angu-128K
  • Ngemva Kokuqeqeshwa: SFT → RLVR ekubhaleni amakhodi, izibalo, ukusetshenziswa kwamathuluzi, ukucabanga, ulwazi
  • Umkhawulo Wedizayini: Ukusebenza kahle kwe-inference kuma-GPU wempahla aqinisekiswe ngokukhipha imali

Umndeni Wesibonelo

Izindawo Zokuhlola Ayisithupha Zikhishiwe

Ipayipi eligcwele ukusuka ku-base pretrain ngokusebenzisa okuhlukile okushuniwe kwe-RL. Sebenzisa i-Instruction ukuze uthole izimpendulo eziqondile zokubambezeleka okuphansi. Sebenzisa Ukucabanga ukuze uthole ukulandelanisa okucacile kwesinyathelo nesinyathelo.

ISISEKOI-Mellum2-12B-A2.5B-Base-PretrainNgaphambi kwesandiso somongo

ISISEKOI-Mellum2-12B-A2.5B-IsisekeloNgemuva kwesandiso se-YaRN

I-SFTI-Mellum2-12B-A2.5B-Yala-SFTIziyalezo ezigadiwe

I-SFTI-Mellum2-12B-A2.5B-Ukucabanga-SFTUkucabanga okugadiwe

I-RLVRI-Mellum2-12B-A2.5B-YalaI-RL-tuned, ayikho i-CoT

I-RLVRMellum2-12B-A2.5B-UkucabangaI-RL-tuned, i-CoT ecacile

Izilinganiso

Imiphumela Yokuhlola (Okuhlukile Kokufundisa)

Zonke izinombolo zizibike ngokwazo ngabakwa-JetBrains. Isethi yokuqhathanisa: amamodeli anesisindo esivulekile ebangeni le-4B–14B.

Ibhentshimakhi Mellum2 Qwen3.5 9B Ungqongqoshe 3 14B I-OLMo-3 7B
I-LiveCodeBench v6 37.2 63.7 42.4 28.2
EvalPlus 78.4 71.8 74.1 67.3
I-MultiPL-E 67.1 67.1 71.5 36.1
I-BFCL v3 66.3 70.5 52.7 41.9
I-AIME 2025+2026 41.7 58.3 33.3 40.0
IFEval 75.8 83.9 67.3 83.2

Sebenzisa Amacala

Lapho I-Mellum2 Ilingana Ekukhiqizeni

I-JetBrains ibeka i-Mellum2 “njengemodeli egxilile” – ibamba izinyathelo zemvamisa ephezulu, ezizwelayo ukubambezeleka ngaphakathi kwamapayipi amakhulu e-AI.

  • Umzila & Orchestration – Hlaziya ukwaziswa bese ukhetha imodeli efanele noma ithuluzi ngomsebenzi ngamunye
  • Amapayipi e-RAG – Fingqa umongo obuyisiwe ngokubambezeleka okuphansi ngaphambi kokukhiqiza impendulo
  • Ama-Sub-Agents – Phatha izinyathelo eziphindaphindayo kumapayipi e-ejenti (ukuqoqwa kokuqukethwe, ukuqinisekiswa, ukuhlela)
  • Ukuthunyelwa Okuyimfihlo – I-Apache 2.0 ivumela ukusingathwa okugcwele ngaphandle kwezingcingo ze-API zangaphandle ezidingekayo

Amandla Nemikhawulo

Yini Esebenzayo Nengasebenzi

I-Mellum2 yakhelwe ukusebenza kahle ezindimeni zezingxenye, hhayi amandla eleveli yomngcele kuwo wonke amabhentshimakhi.

✓ Amandla

  • 2.5B amapharamitha asebenzayo — hlanganisa imodeli eminyene engu-2.5B
  • Ikhanda le-MTP linika amandla ukuqopha okucatshangelwayo okwakhelwe ngaphakathi
  • Iwindi lokuqukethwe kwethokheni engu-131K
  • I-Strong EvalPlus (78.4) ne-BFCL v3 (66.3)
  • I-Apache 2.0 – ukusetshenziswa kwezohwebo, ukulungisa kahle, ukuzibamba
  • Ukusekelwa kwe-vLLM ngokushaya amathuluzi

✗ Ukulinganiselwa

  • Umbhalo nekhodi kuphela — akukho okokufaka kwe-multimodal
  • I-LiveCodeBench v6 (37.2) ngaphansi kwe-Qwen3.5 9B (63.7)
  • I-GPQA Diamond (40.9) ngaphansi kweziqhathaniso eziningi
  • I-GSM-Plus (80.5) ilandela wonke amamodeli asohlwini
  • Hhayi ukushintshwa komngcele — indima yengxenye kuphela

Ukuqala Ngokushesha

Sebenzisa nge-vLLM

Faka i-vLLM futhi unikeze okuhlukile kwe-Instruction. Nika amandla ukushaya kwamathuluzi ngesihlahleli se-hermes sokugeleza komsebenzi wokushaya ucingo.

pip install vllm

# Basic serve
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct 
  --max-model-len 131072

# With tool calling
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct 
  --max-model-len 131072 
  --enable-auto-tool-choice 
  --tool-call-parser hermes

Izisindo zemodeli: huggingface.co/JetBrains/mellum-2 · Umbiko wobuchwepheshe: arXiv:2605.31268

Ukuqalisa

Khonza i-Mellum2 nge-vLLM:

pip install vllm
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct --max-model-len 131072

Ngokushaya ithuluzi kunikwe amandla:

vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct 
  --max-model-len 131072 
  --enable-auto-tool-choice 
  --tool-call-parser hermes

Ukusebenzisa ilabhulali ye-Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Instruct")

messages = [{"role": "user", "content": "Write a Python function to reverse a string."}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Hlola Izisindo zemodeli futhi Imininingwane yobuchwepheshe. Futhi, zizwe ukhululekile ukusilandela Twitter futhi ungakhohlwa ukujoyina wethu 150k+ ML SubReddit futhi Bhalisela ku Iphephandaba lethu. Linda! ukutelegram? manje ungasijoyina kuthelegramu futhi.

Udinga ukusebenzisana nathi ekuthuthukiseni i-GitHub Repo yakho NOMA Ikhasi Lobuso Lokugona NOMA Ukukhishwa Komkhiqizo NOMA I-Webinar njll.? Xhuma nathi


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button