Depaagent: A deep-thinking AI agent that performs independent reasoning, tool discovery, and action execution within a single consultation process

nimda November 1, 2025

0 23 4 minutes read

Depaagent: A deep-thinking AI agent that performs independent reasoning, tool discovery, and action execution within a single consultation process

Most agent frameworks continue with a defined reason, do, check loop, so the agent can only use incoming tools when installed there. This works for small jobs, but fails when the tools are big, when the job is long, and when the agent has to change strategy between consultations. A team from Renmin University of China and Xiaohongshu proposed – He laughedt As the only finishing agent of critical thinking that keeps all of this within a unified thought process.

Joint consultation on the acquisition of the required tools

Deepagent supports a model that extracts four types of text-specific actions, internal thought, tool search, tool call, and memory fold. When the agent decides to search, it queries a dense index that contains tool definitions from large tools, for example 16,000 tools, and then finds the top tools returned to the context. This makes dynamic dynamics accessible, the model does not depend on the full tool list, and remains consistent with the actual locations where the tools change where the tools change where the tools change.

Autonomic memory capacity for long-term tasks

A long sequence of toolkits, web results, and code responses would overflow the context. Depiagent solves this with a step to wrap the standalone memory. When a model issues a Fold token, Basic LLM compresses the full history into three memories, a working memory that records the lower goal and recent constraints, and a tool memory that records tool names, arguments, and results. These memories are fed back as structured text, so the agent continues with detailed but rich information.

ToolPO, Strengthening Tool Use

Guarded colors do not teach strict tool usage, because correct tool calls are only a few tokens in a long generation. The research team appreciates it Tool policy functionality, ToolPOto fix this. ToolPO works rollouts on the used API of LLM, so the training is stable and cheap, then it costs the reward in the purpose of the style of PPO direct, and it trains with the purpose of PPO of the style of PPO. This is how the agent learns not only to call tools, but also to decide when to search and when to wrap memory.

Benchmarks, labeled tools vs open set tools

The research group examines in 5 General Benches using Benches, tools, API Bank, TMDB, Spotify, tool, and 4 activities down, Alfworld, webshop, Gaia, Hle. In the case where the tool is written, where every method is given the specific tools it needs, Depagent 32B RL with APDB, 85.0 in ToolBB, 85.3 in ToolHop, which is 32B level, which is the most powerful result of 52. The activity icons caused as response and codeaction can match single datasets, for example the response with strong models is high in TMDB and Spotify, so no good summary is restricted more uniform.

In an open recovery system, which is logical, the deagent must first find the tools and call them. Here the Depagent 32B RL reaches 64.0 in ToolBench and 40.6 in tools, while the most powerful bases of work reach 55.0 in tools and 36.2 in tools, so the end of the agent still holds the lead. The research group also shows that the recovery of the independent tools itself raises the agents of the work flow, but they receive a lot of money, ensuring that the construction and training are accompanied by large tools.

Low places

In alfworld, webshop, Gaia, and Hle, all under the 32B consulting model, the success of 91.8 percent and 56.3 GAIA, and a higher figure than the agents of the work flow. These tasks are long and noisy, so a combination of memory crunching and Toolpo can be a source of gaps.

Key acquisition

Deepagent keeps the entire agent loop within a single consultation radio, the model can think, search tools, call them, and continue, so it is not limited to the workflow of the React style.
It uses dense retrieval over the largest tool registries, 16,000 tools including rappapi tools and nearly 3,900 tool tools, so the tools do not have to be written in advance, they are available on demand.
The effective memory compression module compresses the history of long-term connection to the Episodic resource, performance, and device memories, which prevents the context from filling up and keeps it stable for a long time.
The implementation of the tool policy, ToolPO, Trains Tool Use to end with APIS made with APIS and Token Level Adfort, so the agent has learned to issue the right tools, not only to reach the final answer.
In 5 Tool Benchmarks and 4 drop-in tasks, Depagent at scale 32B matched the results of work in Workflow Toollines for both written and open tools and ToolHop where tool availability is critical.

Deepagent is an effective step in creating Agent Archities that do not depend on independent tools, because they include independent reasoning, retrieving more than 16,000 critical tools and ToolHop tools, systematic tool calling, and memory collection in a single loop. The use of LLM made with APIs in ToolPO is an engineering preference, but it solves the problem of latency and inconsistencies that plague previous tool agents. Tests show a consistent gain of 32B levels on the instrument with that written and open settings, not far off peaks. This release enables major tools to actually use llm agents. All in all, the certainty that the end-to-end elimination of tool agents with memory and RL emerges as a default pattern.

Look Paper and Github repo. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.

AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of a media intelligence platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.

Follow Marktechpost: Add us as a favorite source on Google.

Source link

nimda November 1, 2025

0 23 4 minutes read