Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on 1/32 Activation-Ratio MoE Architecture

A group of researchers from China released AntAngelMed, an open source medical language model that the group describes as the largest and most sophisticated of its kind currently available.
What is AntAngelMed?
AntAngelMed is a medical domain language model with a total of 103 billion parameters, but it does not use all of those parameters during decision-making. Instead, it uses a Mixture-of-Experts (MoE) architecture with an activation ratio of 1/32, which means that only 6.1 billion parameters are active at any time a query is processed.
It helps to know how MoE structures work. In a typical dense model, every parameter participates in processing every token. In the MoE model, the network is divided into 'specialist' sub-networks, and the routing mechanism selects a subset of them to handle each input. This allows you to have a very large number of parameter calculations – often associated with strong information power – while keeping the actual computational cost of the design proportional to the minimum number of active parameters.
AntAngelMed inherits this design from Ling-flash-2.0, a basic model developed by inclusionAI and guided by what the team calls Ling Scaling Laws. Special settings on top include: refined expert granularity, distributed expert tuning, balance attention methods, lossless sigmoid routing, MTP (Multi-Token Prediction) layer, QK-Norm, and Partial-RoPE (Rotary Position Embedding is used for the smallest set of attention heads). According to the research team, these design choices together allow small MoE models to deliver 7× efficiency compared to dense architectures of the same size meaning that with only 6.1B activated parameters, AntAngelMed can almost match the performance of a 40B dense model. Separately, as the output length increases during prediction, the relative speed advantage can reach 7× or more over dense models of the same size.

Training Pipeline
AntAngelMed uses a a three-stage training process designed to layer general language understanding on top of in-depth practice in the medical field.
I the first stage It is a continuous advance training of major medical companies, including encyclopedias, web text, and academic publications. This section is built on the Ling-flash-2.0 testing environment, which gives the model a solid foundation for general thinking before medical technology begins.
I the second phase Supervised Fine-Tuning (SFT), where a model is trained on a multi-source instruction dataset. This dataset mixes common thinking tasks – math, programming, logic – to maintain thinking skills, alongside clinical contexts such as patient Q&A, diagnostic reasoning, and safety and ethical cases.
I the third stage Reinforcement Learning using the GRPO (Group Relative Policy Optimization) algorithm, combined with task-specific reward models. GRPO, originally introduced in a DeepSeekMath paper, is a variant of PPO that estimates fundamentals from group scores rather than a different critical model, making it mathematically simpler. Here, reward signals are designed to shape behavioral models toward empathy, structured clinical responses, safety parameters, and evidence-based reasoning – all with the goal of reducing biases in clinical questions.
Inference Performance
On H20 hardware, AntAngelMed passes 200 tokens per second, which the research team reports is about 3× faster than the 36 billion parameter dense model. With YaRN (Yet Another RoPE Extension), it supports a context length of 128K – long enough to handle full clinical documents, extended patient histories, or dynamic medical interviews.
The research team also released an FP8 version of the model. If this measurement is combined with the improvement of the EAGLE3 prediction formulation, the output of the consensus 32 is much better than FP8 alone: 71% in HumanEval, 45% in GSM8K, and 94% in Math-500. These benchmarks measure coding and math tasks – not medical tasks specifically – but serve as proxies for the general robustness of model outputs across all output types.
Benchmark results
In HealthBench, an open-source medical evaluation benchmark from OpenAI that uses simulated dynamic medical interviews to measure real-world clinical performance, AntAngelMed ranks first among all open source models and outperforms the range of proprietary models at the top, with a particularly notable advantage in the HealthBench-Hard subset.
In MedAIBench, a test program maintained by the China National Artificial Intelligence Medical Industry Pilot Facility, AntAngelMed ranks high, with particularly strong scores in the Q&A medical knowledge and medical ethics and safety categories.
In MedBench, a benchmark for Chinese healthcare LLMs that includes 36 independently selected datasets and nearly 700,000 samples across five dimensions – medical knowledge question answer, medical language comprehension, medical language production, complex medical reasoning, and safety and ethics – AntAngelMed tops overall.
Marktechpost Visual Explainer
Key Takeaways
- AntAngelMed is a 103B open medical LLM parameter that activates only 6.1B parameters at the time of determination using the 1/32 aperture ratio MoE design inherited from Ling-flash-2.0.
- It uses a three-stage training pipeline: continuous pre-training in the medical organization, SFT with mixed data of general and clinical instructions, and reinforcement learning based on GRPO for safety and diagnostic reasoning.
- On H20 hardware, the model exceeds 200 tokens/s and supports 128K context lengths by using YaRN extrapolation — about 3× faster than a comparable dense 36B model.
- AntAngelMed ranks first among open source models in OpenAI's HealthBench, surpassing several proprietary models, and tops both the MedAIBench and MedBench leaderboards.
- The model is available on Hugging Face, ModelScope, and GitHub; Model weights are Apache 2.0, code is MIT, and a limited FP8 version has also been released.
Check it out Model weights in HF, GitHub Repo again Technical details. Also, feel free to follow us Twitter and don't forget to join our 150k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us



