Generative AI

DeepReinforce Releases Ornith-1.0: An Open Source Model Family That Learns Its Own RL Scarves





DeepReinforce released Ornith-1.0an open source model family built for agent coding. The range includes four sizes, from the compact 9B model to the 397B hybrid-professional standout. All testing sites are licensed under the MIT Hugging Face license. The models were post-trained over the pre-trained Gemma 4 and Qwen 3.5.

Most coding agents pair a model with a fixed, custom-designed harness. Ornith-1.0 instead learns to write his own. The DeepReinforce research team reports state-of-the-art results among open models of the same size.

The TL;DR

  • Ornith-1.0 ships in sizes 9B, 31B, 35B-MoE, and 397B-MoE under MIT, built on Gemma 4 and Qwen 3.5.
  • The model learns its scaffolding during RL, co-optimizing the harness and solution.
  • The Ornith-1.0-397B tops the Claude Opus 4.7 in both benchmarks, but not the Opus 4.8 or the larger GLM-5.2-744B.
  • Three layers – fixed trust boundary, deterministic monitoring, frozen LLM judge – prevent reward hacking.

What is Ornith-1.0?

Ornith-1.0 is a set of reasoning models tuned by coding agents. The exceptions are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B model is a mix-of-artists and activates about 3B parameters per token. FP8 and GGUF builds are also published for immediate local deployment.

Each model is a conceptual model. Answers are opened with a block before the last answer. The feed recipes enable the analyzer to think, so that the trace returns separately reasoning_content field. Models also issue well-formed tool calls for agent loops.

Shipping is straightforward. The 9B model is about 19GB in bf16 and runs on a single 80GB GPU. Providing recipes for vLLM, SGlang, and Transformers. Each model presents an OpenAI compatible endpoint. So standard agent frameworks work without any code changes.

Interactive Descriptor


Ai Agent Protection against Policy, Rules, Processes
Back to top button