PORTool: Value-Aware Policy Development with Multi-Tool-Integrated Reasoning Reward Tree

0 3 1 minute read

PORTool: Value-Aware Policy Development with Multi-Tool-Integrated Reasoning Reward Tree

Multi-tool integrated reasoning enables LLM-enabled tool-based agents to solve complex tasks by combining natural language reasoning with calls to external tools. However, training such agents using only outcome rewards results in allocation ambiguity, obscuring which intermediate steps (or tool use decisions) lead to success or failure. In this paper, we propose PORTool, a value-aware policy optimization algorithm that leverages the agent's tool-using ability from outcome-level monitoring while assigning reward at the step-level. Specifically, the PORTool generates a distributed output tree where trajectories share origins before merging, allowing direct comparisons between alternative tool implementation decisions within the same context. It then evaluates the importance of each step with a signal that governs correctness, that is, whether the offspring of that step can eventually produce the correct final answer, and an auxiliary word that indicates whether the step calls are successful. Using these intelligent weightings, PORTool updates the policy to generate effective measures to call the tool, guided by both the local comparison within the decision of each branch and the overall quality of all trajectories. Experiments show that the PORTool improves the accuracy of the final answer while reducing the instrument call steps compared to state-of-the-art foundations, and ablation studies confirm the robustness of important step-wise measurements.

† Purdue University
** Work done while at Apple

Source link

nimda 3 weeks ago

0 3 1 minute read

PORTool: Value-Aware Policy Development with Multi-Tool-Integrated Reasoning Reward Tree

nimda

Leave a Reply Cancel reply

Subscribers, Revenue, Market Share & Global Reach

5-return back to the base

Gemma 3 270m: Model of a hyper-effective compact of AI

Build a Complete Langfuse Visualization and Testing Pipeline for Tracking, Rapid Management, Scoring, and Testing

Cut researchers present the work that calls llms: Eliminating SQL relief to improve the accuracy of information and efficiency

OASIS: Simuleringar av social interaction mellan en miljon agent

FALCON 3 models are now available at Amazon Sagemaker Jumpstart

This AI paper introduces codesters: Physical models are symbolic language with code / guide

Meta SAM 2.1 is now available in Amazon SageMaker JumpStart

nimda

Subscribe to our mailing list to get the new updates!

A Developer's Guide to Structured Inference: Handling Negative Constraints, Structured JSON Outputs, and Samples Made from Different Perspectives

Merlyn's Advice on What to Do When the World Gets You Down – The Marginalian

Related Articles

Amazon Nova Act is now HIPAA compliant

Intelligent radiology workflow optimization with AI agents

Integrating AWS API MCP Server with Amazon Quick using Amazon Bedrock AgentCore Runtime

Imemezela ukusekelwa kwe-OpenAI-ehambisanayo ye-API yamaphoyinti okugcina we-Amazon SageMaker AI