Reactive Machines

Ferret-UI Lite: Tutorials for Building Small GUI Agents on Devices

Developing autonomous agents that effectively interact with Graphical User Interfaces (GUIs) remains a challenging open problem, especially for small device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that works across different platforms, including mobile, web, and desktop. Using techniques optimized to develop small models, we build our 3B Ferret-UI Lite agent by using a combination of various GUI data from real and artificial sources, strengthening the performance of prediction time by using a chain of thoughts and the use of virtual tools, and reinforcement learning with designed rewards. Ferret-UI Lite achieves performance competitive with other small GUI agents. In GUI support, Ferret-UI Lite scores 91.6%, 53.3%, and 61.2% in the ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI navigation, Ferret-UI Lite achieves success rates of 28.0% on AndroidWorld and 19.8% on OSWorld. We share our methods and lessons learned in developing integrated, on-device GUI agents.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button