Salesforce AI issued GTA1: A Peneforms test gui in OpenForms OPENAI's CUA

Salesforce Ai test Gta1The Graphical agent of the user (GUI) is designed to work independently in the active program as Linux, GTA1 addresses two critical formats in Gui Agent Development: Accidental Services Arrangement including False action. With an estimate of success of 45.2% in Osworld Benchmark, GTA1 passes the cua of Opelai (using a computer), creates a new record between open models.
Basic challenges in GUI AGENTS
GUI agents usually translate high quality user's commands to the layout-click, keystroke, or UI interactions – while viewing UI updates after each action to organize the following action. However, two issues continue:
- To plan for Aliguity: The following actual action can carry out the work, resulting in a variety of performance and trust.
- Basic accuracy: Translating the negative acts of action in accurate acts, linking the Gui to work together is especially challenging in the highest settlement, powerful flexibility.
GTA1 introduces novel ways to resolve both.
Sickness in Test-Time Scaling
Traditional planners commit to one action study at each of the decisions of the decisions, restricting the stability. Gta1's To estimate the test period introduces a simple yet practical solution: Financial for actual actions in each row, and employs a Multimodal Judge modelIt's well enough model of language-to check and select the best.
This method avoids a premature commitment to SuffuTitimal programs and enables the agency to better be killed without requiring future issues, invisible in the GUI. Importance, this approach may apply to any editor and a scale well for the unity of the work and the size of the verb space.
Emphasizing reading with basic accuracy
For the GUI foundation, many front models depend on the good guidelines to predict the target center, which restricts normal development. GTA1 Accepts Refreshment Framework (RL) based on A group associated with policy policy (GRPO). Instead of relying on the medium-in-the-“imagination) or predicting binding boxes, the model reads directly from Click Rewards: Only rewarded when the compilation of predicted falls within the right UI.
With this reward structure, GTA1 reaches the accuracy of the world power without difficulty or higher the direction of the imaginary style. Significantly, apartheid study shows that removing auxiliary signals such as “Thinking” or Iou-Based Coins in Iooou actually develop basic performance – especially in sharp areas.
Working at all benches

Gta1 puts the ordinary to several exams:
- OSWORLD (Average of Workplace Success): GTA1-7B reaches 45.2%On OpenforffordFormForm Open Cua (42.9%) and Claude 3.7 (28.0%).
- Screenspot-Pro (Basic Accuracy): GTA1-7B scores 50.1%before models like the field 72b (34.5%).
- ScreensPot-V2 (Stage Stage Stage Base): GTA1-72B Hits 94.8%About comparing higher model models.
- OSWORD-G (Linux Gui foundation): GTA1-7B reaches 67.7%It is throughout the open sources of open sources.
These results ensure the performance of both planning and new developing programs presented in GTA1.
More Extremed Drawings
- Cleaning data: Explosions have been shipped from datasets such as Aria-UI and the AS-Atlas are filtered using Omniper to improve signal training.
- The model scale: Right-level scale of models from 7b to 72B parameter, with GTA1-7B providing better trading between work and compute.
- The renewal of the judge: Multimodal Judge used in estimating time can be the same LLM used for planning, reducing more.
Store
The GTA1 indicates that the strong and accurate gai agents can be constructed using a dynamic stage framework for the variety of testing time and direct RL foundation. With unwanted difficulties – such as thought-outs-check-in-ai-ai thoughts introduced the intensive construction Agest move to the open digital interface.
Look Paper, codes, The 7b model, 32B model including The 72B model. All credit for this study goes to research for this project. Also, feel free to follow it Sane, YouTube including Disclose and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.
Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.
![Black Forest Labs Releases FLUX.2 [klein]: Integrated Flow Models for Interactive Visual Intelligence Black Forest Labs Releases FLUX.2 [klein]: Integrated Flow Models for Interactive Visual Intelligence](https://i2.wp.com/www.marktechpost.com/wp-content/uploads/2026/01/blog-banner23-30-1024x731.png?w=390&resize=390,220&ssl=1)


