Ultracua: An agent-based model of Agents that breaks the gap between General-Purpose Agents and API-based APIs

0 2 4 minutes read

Ultracua: An agent-based model of Agents that breaks the gap between General-Purpose Agents and API-based APIs

Computerized agents are limited to primitives. They click, they write, they scroll. Long action chains create location errors and pollution steps. Apple researchers present Ultracuaa base model that creates a hybrid action environment that allows the agent to combine low-level GUI actions with high-level organizational calls. The model chooses a cheap and reliable movement for each step. The method improves efficiency and reduces steps in osworld, and transfer to WifagentaRena without special Windows training.

What are the hybrid verbs that change?

Hybrid verbs treat tools like first-class verbs. A TOOL CALL that combines multi-step functionality as a single function with a clear signature and docstring. Clicking or pressing a key is still available when no editing method is available. The agent learns to trade-off between both modes. The goal is to reduce cascade errors and reduce the number of steps. The research team positions this as a bridge between GUI-only agents and Cuntirm agents.

Limited tool acquisition

Ultracua builds its own tool library with an automated pipeline. The program extracts keyboard shortcuts and commands from software documents. The program includes open source implementation from agent tools. The program also uses coding agents to integrate new tools. Each tool is a lovely display that hides a long gui sequence. The research team reports coverage across all 10 desktop domains with 881 tools. The biggest buckets include vs code with 135 tools and libreoffice writer with 123 tools. Thunderbird and Gimp also have in-depth coverage.

Informative careers and trajectories

Training requires basic observation and stable rewards. The Ultracua uses a dual-engine. The first pipeline names atomic verifiers for browsers, files, images, and program state, and executes operations that satisfy those entities. The first pipeline evaluates the OS and proposes tasks that match the context that is validated at that time. The result is 17,864 certified jobs in 10 domains such as Chrome, LibreOffice, Gimp, VS Code, and many other applications. Chrome has 2,826 functions. The libreoffice suite recognizes 5,885 functions. Multi App functions up to 2,113.

The release of various agents produces successful hybrid trajectories. The editor uses OpenAi O3 for decision making. Lounger uses GTA1-7B for intuitive virtual birth. The rollout reveals about 26.8k trajectories that show when you used the tool and when you worked in the Gui. These indicators are the core of the segment.

Training method

The training has two phases. Phase 1 is well supervised. The models train three echoes with a learning rate of 2e-5 on effective trajectories. The loss is used in response to the wisdom to avoid weighing the first steps. Phase 2 is true online education. Models train 150 steps at a learning rate of 1e-6 on certified tasks graded by difficulty. Policy use follows a different grpo with a higher Clip, and removes KL synchronization and format rewards. The reward includes the sparse result that results in the term of using the tool. Tests use nvidia h100 gpus. Context is kept close to 32K by controlling the number of displayed tools.

Results in osworld

Ultracua develops success in the second scale of 7B and 32B. Under the budget of 15 steps, Ultracua-32B achieves a success rate of 41.0 percent. Opecua-32B reaches 29.7 percent. A total gain of 11.3 points. Ultracua-7b reaches 28.9 percent. UI-Tars-1.5-7b reaches 23.4 percent. SINS IS DONE BEFORE THE FREE INTERIOR. A breakdown of each domain showing consistent lifting across Chrome, authoring, VS Code, and cross-application functions. The average steps down against the bases. These shifts indicate better action selection than multiple attempts alone.

Cross-platform transfer to WindowsAgentaRena

Ultracua only trains on Ubuntu based on OSWorld Data. The model is then tested at Wifagentarena. Ultracua-7b achieves a success rate of 21.7 percent. This outperforms UI-Tars-1.5-7b by 18.1 percent and the Qwen2 base trained on Windows data by 13.5 percent. The result suggests that hybrid action strategies are learned by transferring one platform to other platforms. The paper highlights this as the ZERO Shot Platform Greatratic.

Key acquisition

Ultracua organizes a hybrid action space that allows a single agent to switch between GUI calls and good tool calls, which reduces long error-prone objects.
The library's research team is easy-to-use with an automated pipeline and doubles with a synthetic data engine, revealing 17,000 Plus proven jobs for training and testing jobs.
The training follows a two-phase recipe, to guide the beauty in effective trajectories hybrid then the reinforcement of the internet to learn the guaranteed activities in the guaranteed activities, which is to build where to call the tools that have evolved into Gui.
In Osworld, Ultracua reports a relative improvement of 22 percent with base models and 11 steps less, showing gains in reliability and efficiency.
The 7b model achieves a success rate of 21.7 percent in WindowsAgentaRena without special training for Windows, which shows the cutting edge of the Hybrid Action policy.

Ultracua moves computer agents from briteritive action chains an brick to a hybrid action policy, which combines rui primitives with programsm calls It measures tools with an automated pipe and breaks them with a synthetic data Engine that presents 17,000 activities and is able to fine-tune and online reinforcement of supported signals. The reported results include a relative improvement of 22 percent in OSWorld with 11 fewer steps, and a 21.7 percent success in WindowsAgentaRena without special training for Windows, which shows the delivery of a specific policy.

Look Paper here. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.

Michal Sutter is a data scientist with a Master of Science in Data Science from the University of PADOVA. With a strong foundation in statistical analysis, machine learning, and data engineering, Mikhali excels at turning complex data into actionable findings.

Follow Marktechpost: Add us as a favorite source on Google.

Source link

nimda 10 hours ago

0 2 4 minutes read

Ultracua: An agent-based model of Agents that breaks the gap between General-Purpose Agents and API-based APIs

What are the hybrid verbs that change?

Limited tool acquisition

Informative careers and trajectories

Training method

Results in osworld

Cross-platform transfer to WindowsAgentaRena

Key acquisition

nimda

Leave a Reply Cancel reply

Google AI issuing MLE-Star: State Engineering Agent to work with Autory A Tasks

Servicess MCP brings correcting AWS running AWS travel within modern IDs

Subscribers, Revenue, Market Share & Global Reach

Google AI introduces the flame method: One active learning that selects the most informative samples for training and smoothing the special model

The Ultimate Guide to ChatGPT: What You Need to Know

Be Part of the AI Revolution at the Chatbot Conference Tomorrow! | by Cassandra C.

Botober 2024

Virtual Personas for Language Models with An Anthology of Backstories – Berkeley Artificial Intelligence Research Blog

Machine Learning Interview Questions and Answers

What are the hybrid verbs that change?

Limited tool acquisition

Informative careers and trajectories

Training method

Results in osworld

Cross-platform transfer to WindowsAgentaRena

Key acquisition

nimda

Subscribe to our mailing list to get the new updates!

The Hidden Curriculum of Data Science Interviews: Which Companies Really Test It

When translators sing: Synchronizing the power of visual information based on textual information

Related Articles

Google AI introduces the flame method: One active learning that selects the most informative samples for training and smoothing the special model

Anthrogen presents Odyssey: A 102B protein language model of protein substitution according to training with discrete def

Pokeeresearch-7b: An open source 7b deep learning agent trained with reinforcement learning from AI Reportback (RLAIIIF) and strong dynamic reasoning

How to design a fully functional business assistant with Retrieval Augmentation and policy Guardrails using open source AI

Leave a Reply Cancel reply

Google AI issuing MLE-Star: State Engineering Agent to work with Autory A Tasks

Servicess MCP brings correcting AWS running AWS travel within modern IDs

Subscribers, Revenue, Market Share & Global Reach

Google AI introduces the flame method: One active learning that selects the most informative samples for training and smoothing the special model

The Ultimate Guide to ChatGPT: What You Need to Know

Be Part of the AI ​​Revolution at the Chatbot Conference Tomorrow! | by Cassandra C.

Botober 2024

Virtual Personas for Language Models with An Anthology of Backstories – Berkeley Artificial Intelligence Research Blog

Machine Learning Interview Questions and Answers

Be Part of the AI Revolution at the Chatbot Conference Tomorrow! | by Cassandra C.