PORTool: Value-Aware Policy Development with Multi-Tool-Integrated Reasoning Reward Tree

Multi-tool integrated reasoning enables LLM-enabled tool-based agents to solve complex tasks by combining natural language reasoning with calls to external tools. However, training such agents using only outcome rewards results in allocation ambiguity, obscuring which intermediate steps (or tool use decisions) lead to success or failure. In this paper, we propose PORTool, a value-aware policy optimization algorithm that leverages the agent's tool-using ability from outcome-level monitoring while assigning reward at the step-level. Specifically, the PORTool generates a distributed output tree where trajectories share origins before merging, allowing direct comparisons between alternative tool implementation decisions within the same context. It then evaluates the importance of each step with a signal that governs correctness, that is, whether the offspring of that step can eventually produce the correct final answer, and an auxiliary word that indicates whether the step calls are successful. Using these intelligent weightings, PORTool updates the policy to generate effective measures to call the tool, guided by both the local comparison within the decision of each branch and the overall quality of all trajectories. Experiments show that the PORTool improves the accuracy of the final answer while reducing the instrument call steps compared to state-of-the-art foundations, and ablation studies confirm the robustness of important step-wise measurements.
- † Purdue University
- ** Work done while at Apple



