Generative AI

Tencent Open Huyuan-A13B sources: Model of a valid 13B parameter of the Dual-Mode consultation in a way in 256k

The Tencent's Header's Telean Team has already laid Hyyuan-A13BNew model of open open space formed with sparse Mixture-expert (moe) to build. While the model contains 80 billion parameters, only 13 billion billions work during the influence, providing a very effective balance between working and computer expenses. Supports The attention of a distinctive question (snip), 256k the length of the contextno DISTRATION OF DISCOUT DISCOUNTS That combine between faster thoughts and slow mobility.

Designed for effective shipment and thoughtful thinking, Hyuanuan-A13B achieves Top-Tier functionality in all Agentic benches including BFCL-V3, NO-BENCH, C3-benchbesides ComplexfrenchenchUsually large models of brightest to call a long-term content call.

Building: Sparse MOE with active 13B parameters

In its spine, Hunhun-A1B follows a good moe shape that includes 1 the expert has been shared including 64 experts not calledby 8 experts work with the past. Building, supported by measuring exercises, ensures performance consistency while maintaining cost-taking costs. The model includes 32 layers, using Swaglu Working, a 128k vocabulary size, and includes the Gqa by working with memory during a long-standing detention.

Moe Setup model piled by orched Training Curriculum: Wip of 20t-Token hypocrisic, followed by quick conversion and long-content. This last phase sets the context window first in 32k and 256 tokens that use to apply for NTK-Aazi and Salt, verifying stable performance in a great deal.

Exhibition: Quick Thinking and Small

Huyun-A13B standing feature is DEAL-MODE CHIEN-OF-TEMPENT (CON) Power. Sponsor both lower latency Quick Thinking Typical Questions and More Explanation Mode To think about a little Consulting mode in many steps. These methods are managed by a simple tag program: /no think A quick tendency and /think in a light consultation. This fluctuation allows users to synchronize the computational costs to use the work.

Post training-: Strengthening to learn about work-related models

Training pipe after Huyun-A13B training includes Well-free multi-Stage (sft) including Emphasizing reading (rl) in all specific tasks of convenience. RL categories include Rewards-based rewards including Tool relating to a toolincludes the murder of sandbox and checks and checks based on the law.

In the agent's training phase, a group planned different situations for planner use tool using planner, checker, and tools of tools, which produce more A combination of 20,000 format. This is the strongest Huyun-A13B skills of extracting the real job of the spreadsheet, information search, and formal consultation.

Checking: Agency-of-The-The-Art

Hiyuan-A13B shows Benchmark results In various NLP activities:

  • Despite of- Account, Whatderbesides GPQAscores in black or more models of the upper black and moe.
  • It passes QWEN3-A22B including Deepseek R1 in Logical (BbH: 89.1; Zibraagic: 84.7).
  • In Coding, Holds yours by 83.9 to MBPP and 69.3 in Mulndpling-e.
  • A Members Agent's activitiesleads to BFCL-V3 (78.3) including ComplexFuncbench (61.2)Ensures its power to use the tool.

Understanding the long context is another key. Despite of- Penguincrolls87.7-scores-just gemini 2.5 pro. Despite of- Rulerstrengthens the highest performance (73.9) even 64k-128k contextLarge Models such as Sequzer3-A22B and DeepSeek R1 in the bottom of the heart.

Equity and University and Shipment

Hyyuan-A13B fully integrated with preferred approval structures such as vllm, Shangbeside Tenzorrt-llm. Sponsoring the correct formats like W16a16, W8A8besides KV Cache FP8and things like this Auto Prefix Caching including Chunk lesson. Reaches until 1981.99 tokens / sec Casting in 32-batch (2048 input, the length of the 14336), which makes it possible for active applications.

Open source and industry

It is found in the face of riding and GitHub, Hyuan – A13B is issued with an open source open. It will engineering the effective research and the use of manufacturing, especially in critical latency areas and long-contained activities.

By combining MOE Fitness, Agentic Thinkingbesides Available Source AvailabilityTencent's Holon-A1B is dedicated to the compulsory of the ILMS HeavMweight, enables comprehensive assessment and distribution without self-sacrifice.


Look The paper. All credit for this study goes to research for this project. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.


Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button