Tencent Open Huyuan-A13B sources: Model of a valid 13B parameter of the Dual-Mode consultation in a way in 256k

nimda June 28, 2025

0 8 3 minutes read

Tencent Open Huyuan-A13B sources: Model of a valid 13B parameter of the Dual-Mode consultation in a way in 256k

The Tencent's Header's Telean Team has already laid Hyyuan-A13BNew model of open open space formed with sparse Mixture-expert (moe) to build. While the model contains 80 billion parameters, only 13 billion billions work during the influence, providing a very effective balance between working and computer expenses. Supports The attention of a distinctive question (snip), 256k the length of the contextno DISTRATION OF DISCOUT DISCOUNTS That combine between faster thoughts and slow mobility.

Designed for effective shipment and thoughtful thinking, Hyuanuan-A13B achieves Top-Tier functionality in all Agentic benches including BFCL-V3, NO-BENCH, C3-benchbesides ComplexfrenchenchUsually large models of brightest to call a long-term content call.

Building: Sparse MOE with active 13B parameters

In its spine, Hunhun-A1B follows a good moe shape that includes 1 the expert has been shared including 64 experts not calledby 8 experts work with the past. Building, supported by measuring exercises, ensures performance consistency while maintaining cost-taking costs. The model includes 32 layers, using Swaglu Working, a 128k vocabulary size, and includes the Gqa by working with memory during a long-standing detention.

Moe Setup model piled by orched Training Curriculum: Wip of 20t-Token hypocrisic, followed by quick conversion and long-content. This last phase sets the context window first in 32k and 256 tokens that use to apply for NTK-Aazi and Salt, verifying stable performance in a great deal.

Exhibition: Quick Thinking and Small

Huyun-A13B standing feature is DEAL-MODE CHIEN-OF-TEMPENT (CON) Power. Sponsor both lower latency Quick Thinking Typical Questions and More Explanation Mode To think about a little Consulting mode in many steps. These methods are managed by a simple tag program: /no think A quick tendency and /think in a light consultation. This fluctuation allows users to synchronize the computational costs to use the work.

Post training-: Strengthening to learn about work-related models

Training pipe after Huyun-A13B training includes Well-free multi-Stage (sft) including Emphasizing reading (rl) in all specific tasks of convenience. RL categories include Rewards-based rewards including Tool relating to a toolincludes the murder of sandbox and checks and checks based on the law.

In the agent's training phase, a group planned different situations for planner use tool using planner, checker, and tools of tools, which produce more A combination of 20,000 format. This is the strongest Huyun-A13B skills of extracting the real job of the spreadsheet, information search, and formal consultation.

Checking: Agency-of-The-The-Art

Hiyuan-A13B shows Benchmark results In various NLP activities:

Despite of- Account, Whatderbesides GPQAscores in black or more models of the upper black and moe.
It passes QWEN3-A22B including Deepseek R1 in Logical (BbH: 89.1; Zibraagic: 84.7).
In Coding, Holds yours by 83.9 to MBPP and 69.3 in Mulndpling-e.
A Members Agent's activitiesleads to BFCL-V3 (78.3) including ComplexFuncbench (61.2)Ensures its power to use the tool.

Understanding the long context is another key. Despite of- Penguincrolls87.7-scores-just gemini 2.5 pro. Despite of- Rulerstrengthens the highest performance (73.9) even 64k-128k contextLarge Models such as Sequzer3-A22B and DeepSeek R1 in the bottom of the heart.

Equity and University and Shipment

Hyyuan-A13B fully integrated with preferred approval structures such as vllm, Shangbeside Tenzorrt-llm. Sponsoring the correct formats like W16a16, W8A8besides KV Cache FP8and things like this Auto Prefix Caching including Chunk lesson. Reaches until 1981.99 tokens / sec Casting in 32-batch (2048 input, the length of the 14336), which makes it possible for active applications.

Open source and industry

It is found in the face of riding and GitHub, Hyuan – A13B is issued with an open source open. It will engineering the effective research and the use of manufacturing, especially in critical latency areas and long-contained activities.

By combining MOE Fitness, Agentic Thinkingbesides Available Source AvailabilityTencent's Holon-A1B is dedicated to the compulsory of the ILMS HeavMweight, enables comprehensive assessment and distribution without self-sacrifice.

Look The paper. All credit for this study goes to research for this project. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.

Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.