Liquid AI's LFM2-2.6B-Exp Uses RL for Pure Reinforcement Learning and Dynamic Hybrid Reasoning to Reinforce the Behavior of a Small Model

Liquid AI has launched LFM2-2.6B-Exp, an experimental testbed for its LFM2-2.6B language model trained by pure reinforcement learning on top of the existing LFM2 stack. The goal is simple, develop the following instructions, information functions, and calculations for a small 3B class model that is oriented towards the device and the use of the edge.
Where LFM2-2.6B-Exp Fits Into the LFM2 Family?
LFM2 is the second generation of Liquid Foundation models. Designed for efficient deployment on phones, laptops, and other edge devices. Liquid AI describes LFM2 as a hybrid model that combines short-range LIV blocks with clustered query attention blocks, controlled by multiplicative gates.
The family includes 4 compact sizes, LFM2-350M, LFM2-700M, LFM2-1.2B, and LFM2-2.6B. They all share a context length of 32,768 tokens, a word size of 65,536, and a precision of bfloat16. The 2.6B model uses 30 layers, with 22 convolution and 8 attention layers. Each size is trained with a 10 trillion token budget.
The LFM2-2.6B is already positioned as the most efficient model. It reaches 82.41 percent in GSM8K and 79.56 percent in IFEval. This puts it ahead of several 3B class models such as the Llama 3.2 3B Instruct, Gemma 3 4B it, and SmolLM3 3B in these benchmarks.
LFM2-2.6B-Exp retains this property. It also uses the same tokens, context window, and hardware profile. The testing environment focuses solely on changing behavior through the reinforcement learning phase.

Pure RL on top of a pre-trained, aligned foundation
This test site is built on LFM2-2.6B using pure reinforcement learning. It is specifically trained in the following teaching, information, and mathematics.
The underlying LFM2 training stack consists of several stages. It includes great supervised configuration in a mix of subtasks and common domains, Direct Customization with standard length, iterative model integration, and reinforcement learning with guaranteed rewards.
But what exactly does 'pure reinforcement learning' mean? LFM2-2.6B-Exp starts from the existing LFM2-2.6B test platform and goes through a sequential RL training program. It begins with the following command, then extends RL training to information-oriented awareness, math, and a small amount of tool use, without the added SFT warm-up or immersion step in that final phase.
The important point is that LFM2-2.6B-Exp does not change the basic architecture or previous training. It replaces the policy with an RL phase that applies verifiable rewards, to a targeted set of domains, over a model that is already monitored and aligned to preferences.
Benchmark Signal, Especially on IFBench
The Liquid AI team highlights IFBench as a headline metric. IFBench is a command-following benchmark that tests how reliably a model follows complex, conditional instructions. In this benchmark, the LFM2-2.6B-Exp outperforms the DeepSeek R1-0528, which is reported to be 263 times larger in parameter calculations.
LFM2 models provide robust performance on a common set of benchmarks such as MMLU, GPQA, IFEval, GSM8K, and related suites. The base model 2.6B already competes well in the 3B segment. The RL checkpoint then pushes the following instructions and calculations continuously, while staying within the same 3B parameter budget.
Essential Properties and Skills
The architecture uses 10 short-range LIV gated blocks and 6 stacked attention blocks for queries, arranged in a hybrid stack. This design reduces the cost of KV cache and keeps prediction faster on consumer GPUs and NPUs.
The pre-training mix uses about 75 percent English, 20 percent multilingual data, and 5 percent coding. Supported languages include English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
LFM2 models expose ChatML as a template and tokens for using native tools. Tools are defined as JSON within the dedicated tool list tags. The model then issues Python like calls between tool call tags and reads tool responses between tool response tags. This property makes the model suitable as an agent core for tool call stacks without custom information engineering.
The LFM2-2.6B, and by extension the LFM2-2.6B-Exp, is also the only model in the family that allows mixed logic by using special logic tokens for complex or multilingual input. That ability is always available because the RL test environment doesn't change to make tokens or structures.
Key Takeaways
- LFM2-2.6B-Exp is an experimental test site for LFM2-2.6B that adds a pure reinforcement learning phase on a pre-aligned, supervised and preferred basis, directed at following instructions, knowledge tasks, and math.
- The LFM2-2.6B core uses a hybrid architecture consisting of LIV blocks with double short gates and clustered query attention blocks, with 30 layers, 22 convolution layers and 8 attention layers, a context length of 32,768 tokens, and a training budget of 10 trillion tokens in 2.6B parameters.
- The LFM2-2.6B already achieves strong benchmark scores in class 3B, about 82.41 percent in GSM8K and 79.56 percent in IFEval, and the LFM2-2.6B-Exp RL benchmark also improves tracking and math performance without changing the architecture or memory profile.
- Liquid AI reports that in IFBench, the instructions behind the benchmark, the LFM2-2.6B-Exp outperforms the DeepSeek R1-0528 even though this one has more parameters, showing strong performance of each parameter for the restricted sending settings.
- LFM2-2.6B-Exp is released on Hugging Face with open weights under the LFM Open License v1.0 and is supported by Transformers, vLLM, llama.cpp GGUF quantizations, and ONNXRuntime, which makes it suitable for agent systems, to extract structured data, to retrieve the necessary augmented assistants 3B model where the augmented device is produced.
Check it out Model here. Also, feel free to follow us Twitter and don't forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Max is an AI analyst at MarkTechPost, based in Silicon Valley, who is actively shaping the future of technology. He teaches robots at Brainvyne, fights spam with ComplyEmail, and uses AI every day to translate complex technological advances into clear, understandable information.



