Reactive Machines

Quantifying data annotation using visual language models to enable portable AI systems

A severe labor shortage is hampering growth in all industries, transport, construction and agriculture. The problem is critical in design: nearly 500,000 positions remain unfilled in the United States, and 40% of the current workforce is close to retirement within a decade. These manpower limitations cause project delays, increased costs, and delayed development plans. To address these challenges, organizations are developing autonomous systems that can perform tasks that close energy gaps, increase operational efficiencies, and provide increased productivity around the clock.

Building autonomous systems requires large, annotated datasets to train AI models. Effective training determines whether these programs deliver business value. Bottleneck: high cost of data processing. Essentially, the act of labeling video data—identifying information about devices, activities, and the environment—is necessary to ensure that the data is useful for training the model. This move could hamper the deployment of models, delaying the delivery of AI-powered products and services to customers. For construction companies managing millions of hours of video, manual data preparation and annotation is impractical. Visual language models (VLMs) help address this by interpreting images and video, answering natural language queries, and generating descriptions at a speed and scale that manual processes cannot match, providing a cost-effective alternative.

In this post, we explore how Bedrock Robotics is tackling this challenge. By joining the AWS Physical AI Fellowship, the startup has partnered with the AWS Generative AI Innovation Center to use visual language models that analyze construction video, extract performance information, and generate labeled training datasets at scale, improving data processing for autonomous construction machines.

Bedrock Robotics: a case study for accelerating autonomous construction

Since 2024, Bedrock Robotics has been developing autonomous systems for construction machines. The company's product, Bedrock Operator, is a reclamation solution that combines hardware and AI models so that excavators and other machines operate with minimal human intervention. These systems can perform tasks such as digging, grading, and material handling with centimeter-level accuracy. Training these models requires a large amount of video imaging equipment, tasks, and environments – a very resource-intensive process that limits scalability.

VLMs provide a solution by analyzing this image and video data and generating textual descriptions. This makes them well-suited for annotation tasks, which are important for teaching models how to relate observed patterns to human language. Bedrock Robotics has used this technology to guide the preparation of training data for AI models, enabling autonomous operation of machines. Additionally, with the right model selection and fast engineering, the company has improved tool identification from 34% to 70%. This transformed a manual, time-consuming process into an automated, scalable data pipeline solution. The success accelerated the deployment of autonomous missions.

This approach provides a replicable framework for organizations facing similar data challenges and demonstrates how strategic investments in foundational models (FMs) can deliver measurable performance results and competitive advantage. Baseline models are models trained on large amounts of data using supervised learning techniques that learn general representations that can be adapted to many downstream tasks. VLMs use these large pre-training methods to combine visual and textual modes, understand, analyze, and generate content in both visual and linguistic formats.

In the following sections, we look at the process Bedrock Robotics used to annotate millions of hours of video footage and accelerate innovation using a VLM-based solution.

From unstructured video data to strategic assets using VLMs

Enabling autonomous computing requires extracting useful information from millions of hours of random performance footage. Specifically, Bedrock Robotics needed to identify tool attachments, tasks, and work site conditions across a variety of situations. The following images are sample video frames from this dataset.

Materials work with an attachment of many tools, each of which requires precise classification in order to train reliable AI models. In collaboration with the Innovation Center, Bedrock Robotics is focusing its innovation efforts by addressing several key tool categories: lifting hooks for material handling, concrete breakers, leveling beams for surface leveling, and chipping buckets for excavation.

These labels allow Bedrock Robotics to select relevant video segments and compile a training dataset representing various machine configurations and operating conditions.

Accelerating AI deployment through strategic modeling

Off-the-shelf VLMs (VLMs without rapid development) struggle with construction video data because they are trained on web images, not operator images from drill rooms. They can't handle unusual angles, machine-specific sights, or dust and weather malfunctions. And they don't have the domain knowledge to distinguish similar visual tools like digging buckets from hacking buckets.

Bedrock Robotics and the Innovation Center addressed this through targeted model selection and rapid optimization. Teams explored multiple VLMs—including open-source options and FMs available on Amazon Bedrock—and were refined with detailed visual descriptions of each tool, guidance on the various tools that are often confused, and step-by-step instructions for analyzing video frames.

These changes improved the classification accuracy from 34% to 70% on a test set consisting of 130 videos, at $10 per hour of video processing. These results show how rapid engineering adapts VLMs to specialized tasks. For Bedrock robots, this configuration has delivered faster training cycles, reduced deployment time, and a cost-effective annotation pipeline that adapts to operational needs.

The way forward: tackling the labor shortage automatically

Competitive Advantage. For Bedrock Robots, visual language systems enabled rapid identification and extraction of key data sets, providing the necessary insights from large-scale construction video. With an overall accuracy of 70%, this cost-effective method provides a practical basis for measuring the accuracy of model training data. It shows how AI innovation can transform workforce barriers and accelerate industry transformation. Organizations that streamline data processing can accelerate autonomous system deployments, reduce operating costs, and explore new areas of growth in industries affected by labor shortages. Through this iterative framework, manufacturing leaders and industry practitioners facing similar challenges can use these principles to drive competitive differentiation within their domains.

To learn more, visit Bedrock Robotics or explore virtual AI resources on AWS.


About the writers

Laura Kulowski

Laura Kulowski is a Senior Applied Scientist at the AWS Generative AI Innovation Center, where he works to develop physical AI solutions. Before joining Amazon, Laura completed her PhD at Harvard's Department of Earth and Planetary Sciences and investigated Jupiter's deep space flow and magnetic field using Juno data.

All Simoneau

All Simoneau is a technology and business leader with over 15 years of experience, currently serving as the Emerging Technology Physical AI Lead at Amazon Web Services (AWS), where he drives global innovation at the intersection of AI and real-world applications. With more than a decade at Amazon, Alla is a recognized leader in strategy, team building, and efficiency, focused on turning cutting-edge technology into real-world change for startups and enterprise customers.

Parmida Atighehchian

Parmida Atighehchian is a Senior Data Scientist at the AWS Generative AI Innovation Center. With over 10 years of experience in Deep Learning and Generative AI, Parmida brings deep expertise in AI and customer-centric solutions. Parmida has led and co-authored highly influential scientific papers in areas such as computer vision, interpretation, video production and photography. With a strong focus on scientific practices, Parmida helps clients with the practical design of AI-powered systems that are productive in both rigid and unstructured pipelines.

Dan Volk

Dan Volk is a Senior Data Scientist at the AWS Generative AI Innovation Center. He has 10 years of experience in machine learning, deep learning, and time series analysis, and holds a Master's degree in Data Science from UC Berkeley. He is interested in turning complex business challenges into opportunities by using cutting-edge AI technologies.

Paul Amadeo

Paul Amadeo is a veteran technology leader with over 30 years of experience spanning artificial intelligence, machine learning, IoT systems, RF design, optics, semiconductor physics, and advanced engineering. As the Technical Lead for Physical AI in the AWS Generative AI Innovation Center, Paul focuses on translating AI capabilities into tangible physical systems, guiding enterprise customers through complex implementations from concept to production. His diverse background includes designing computer vision systems for edge environments, designing smart card manufacturing technologies for robots that have produced billions of devices worldwide, and leading collaborative teams in both the commercial and defense sectors. Paul has an MS in Applied Physics from the University of California, San Diego, a BS in Applied Physics from Caltech, and holds six patents covering optical systems, communication devices, and manufacturing technologies.

Sri Elaprolu

Sri Elaprolu is the Director of the AWS Generative AI Innovation Center, where he leads a global team implementing cutting-edge AI solutions for business and government organizations. During his 13-year tenure at AWS, he led scientific ML teams that partnered with global businesses and public sector organizations. Prior to AWS, he spent 14 years at Northrop Grumman in product development and software engineering leadership roles. Sri holds a Masters in Engineering Science and an MBA.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button