Generative AI

Meituan Releases LongCat-2.0: A 1.6T-Parameter Open MoE Model with 1M Native Content and LongCat Sparse Attention

Meituan released LongCat-2.0a large Mixture-of-Experts (MoE) language model. It carries 1.6 trillion total parameters and open about 48 billion per token. The model addresses agent coding: code understanding, generation, and execution within an agent's workflow.

Two facts stand out. First, LongCat-2.0 supports native 1 million token context window. Second, both training and worship continued in full domestic AI ASIC superpods.

What is LongCat-2.0?

LongCat-2.0 is Meituan's next-generation trillion open parameter model. It follows the LongCat-Flash, a 560B model released in 2025. The architecture was designed around one goal: reliable, efficient agent recording.

Early training took more than that 35 trillion tokens over millions of hours of acceleration. Meituan reports no rollbacks or spikes of unrecoverable losses during the run. That claim of stability is important on non-Nvidia hardware, where tooling is less mature.

Architecture: How the 1.6T Model Stays Cheap to Run

The design includes four ideas that reduce the cost of scale. Each one deserves to be understood on its own.

  • Zero-computation experts: Not all tokens require heavy computing. Simple tokens are like a path of punctuation to a master calculator and back unchanged. Complex tokens involve more technical capacity. The PID controller adjusts the professional bias to hold the average range. This produces a variable opening window of 33B–56B instead of a fixed cost. The core of the MoE uses a circuit-connected design (ScMoE) for maximum efficiency.
  • LongCat Sparse Attention (LSA): The standard attention span measures four times the length of the context. LSA selects only the most relevant tokens, scaling down along the line. Meituan describes it as an evolution of DeepSeek Sparse Attention (DSA). It includes three orthogonal pointing methods. Stream-aware indexing converts discrete memory into contiguous blocks. Cross-Layer Indexing reuses attention across adjacent layers. Hierarchical Indexing uses two-stage coarse-to-fine sorting. Together they support a 1M token window without a memory wall.
  • N-gram embedding: Design adds 135 billion N-gram embedding module. It is always orthogonal to the MoE experts in small dimensions. Meituan says it captures the dense local token relationship. It also reduces memory I/O during large batch recording.
  • After training (MOPD): The dedicated pipeline (MOPD) includes three groups of teachers. This combines the capabilities of Agent, Consultation, and Interaction into one unified model.

For delivery, Meituan uses a 6D matching scheme and a distributed coding architecture. It also uses 'super kernels' and L2-cache weight prefetching to hide I/O latency.

Measurements

Meituan sets LongCat-2.0 as the agent code model. All the illustrations below are from Meituan's own experiments.

Benchmark LongCat-2.0 What it measures
SWE-bench Pro 59.5 Real world software engineering jobs
Terminal-Bench 2.1 70.8 Execution and error detection in shells
SWE-bench Multilingual 77.3 Multilingual database functions

On SWE-bench Pro, Meituan reports LongCat-2.0 ranking GPT-5.5 (58.6). Meituan also claims that overall performance is comparable to Google's Gemini 3.1 Pro. The reported edge is focused on software engineering. In general agent benchmarks such as FORTE and BrowseComp, the coverage shows that it follows the leading boundary systems. Independent leaderboard verification is not yet available.

LongCat-2.0 vs LongCat-Flash

The leap from the previous generation is huge on paper. This table uses the published specifications for each model.

Attribute LongCat-2.0 LongCat-Flash
Total parameters 1.6T 560B
Valid per token ~48B (33B–56B) ~27B (18.6B–31.3B)
Content window 1M tokens (native) 128K tokens
Long content attention LongCat Sparse Attention Multi-head Latent Attention
Reported hardware AI ASIC home superpods (training + serving) H800 GPUs (assumptions reported)
High output 128K tokens Not specified
License MIT MIT
Released June 30, 2026 September 2025
Weights Coming soon Open it

Use Cases with examples

LongCat-2.0 is open to agent-style software work, not just chat. Few concrete patterns match its strength.

  • The thinking of the entire archive: Serve the entire medium-sized codebase in a 1M token window. Ask the model to trace a bug in multiple files at once. This avoids compression hacks that force short windows.
  • Multi-step terminal functions: Run the model inside an agent loop with shell access. It can execute commands, learn errors, and retry until the task passes. Terminal-Bench 2.1 focuses on this workflow.
  • Storage level planning: Request a refactor that includes several modules and tests. The model lays out reasons over the full context before proposing integrated changes.
  • Migration to other languages: Use the SWE benchmark for multilingual capabilities in polyglot repositories. The model can encode logic between languages ​​while preserving behavior.

These patterns work within the standard agent harness. So Dev teams can use the model without building new tools.

How to get it

LongCat-2.0 is accessible through the LongCat API Platform. It exposes both OpenAI and Anthropic compatible endpoints. The model is also in OpenRouter and in harnesses such as Claude Code, OpenClaw, OpenCode, and Codex. Local restraint has not yet taken place, as the weights are still pending.

The endpoint compatible with OpenAI uses the model ID LongCat-2.0. The total length of the issuance is 131072 tokens (128K). The caption below calls the conclusion to end the written discussion.

# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_LONGCAT_API_KEY",
    base_url="
)

resp = client.chat.completions.create(
    model="LongCat-2.0",
    messages=[
        {"role": "system", "content": "You are a coding agent."},
        {"role": "user", "content": "Refactor utils.py to remove duplicate I/O logic."},
    ],
    max_tokens=4096,  # LongCat-2.0 supports up to 131072 (128K)
)

print(resp.choices[0].message.content)

The price is reported at $0.75 per million input tokens and $2.95 per million output. The launch promotion lists $0.30 and $1.20, with the core content being read for free. These figures come from third party installations and are subject to change.

Interactive Descriptor

Ai Agent Protection against Policy, Rules, Processes
Back to top button