How Baz improved AI agent code review accuracy using Amazon Bedrock AgentCore

0 2 5 minutes read

How Baz improved AI agent code review accuracy using Amazon Bedrock AgentCore

Code reviews were always manual and ineffective because of the disconnect between code and product. Developers can review whether the code is compiled and working, but not whether it meets all the functional and design requirements. In the past, QA teams spent hours manually clicking on previews to make sure features behaved as expected, and more time aligning the implementation with the design intent. This manual validation slows down deliverability, introduces inconsistencies, and increases the chances of regression. With the increased speed of development teams, Baz wanted to automate this missing layer of validation, bringing intent, behavior, and implementation into a single review workflow.

This post walks through how Baz built their Special Review agent using Amazon Bedrock and Amazon Bedrock AgentCore. We'll include the architecture decisions, implementation details, and business results they've achieved by using these AWS services to automate their code review process.

Important problems that Baz is trying to solve

Baz is designed to go beyond standard reviews, only exceptions and to ensure that a feature meets the product's intended needs. Early on, Baz noticed that teams struggled with updates that focused on syntax instead of behavior, leaving critical questions like “does it work”, “does it match spec”, “does it behave as intended”, to be answered manually and late in the process. This gap between code and product intent slowed down the team, created design inconsistencies, and required relying heavily on internal QA knowledge that lacked documentation. Baz set out to close this gap with build agents who could not only test the code, but the actual experience delivered.

Solution overview

The Baz Spec Review agent organizes a complex multi-stage verification pipeline: When it triggers (webhook or manual request), it simultaneously queries Figma via MCP and Jira via REST APIs to gather the necessary broad artifacts including technical, product, and design specifications. The system then spawns independent workers (one per requirement) tasked with verifying the requirement. This subagent includes checking the code with the source code repository and validating runtime variables using the Amazon Bedrock AgentCore Browser Tool. The subagent interacts with temporary environments, performs DOM inspection, event simulation, and visual inspection to ensure the implemented implementation conforms to both Figma design specifications and behavioral requirements, delivering final validation throughout the lifecycle of specification to implementation with native AWS orchestration.

The following diagram shows the architecture of Spec Reviewer, a joint solution from Baz and AWS that enables automated design and product verification within your code review workflow. All agent flows are powered by massive language models powered by Amazon Bedrock, providing seamless and secure AI throughout the pipeline. The flow starts when a GitHub webhook starts on a new pull request, routing traffic through the Application Load Balancer (ALB) and Network Load Balancer (NLB) to the Amazon EKS cluster. The Baz Platform acts as a central orchestration layer, coordinating the multi-agent review process.

Within an Amazon EKS cluster, Baz's Special Review Agent divides the validation workflow into specialized subagents. The Specification Subagent, powered by Amazon Bedrock, imports both visual specifications from Figma and functional specifications from Jira, then breaks them down into separate requirements – visual requirements (such as spacing, colors, and component order) and functional requirements (such as acceptance criteria and user story intent).

Implementation Subagents are the core of this architecture. These Amazon Bedrock-enabled agents perform deep code analysis against extracted data, but what sets them apart is their integration with the ability to use the Amazon Bedrock AgentCore Browser. Rather than relying solely on static code analysis, Implementation Subagents can provide actual implementation in a live Preview environment and visually verify that the UI matches the intended Figma designs and that functionality is working as specified in Jira. This combination of code understanding and browser-based validation enables Baz to catch inconsistencies that traditional code analysis tools would completely miss.

The Report Generator combines findings from all sub-factors into a coherent review summary. Once the review is complete, the findings are disseminated through the appropriate channels: comments are sent directly to GitHub PR, notifications are sent to Slack for team visibility, and identified issues can be automatically linked to Jira for tracking and resolution.

How Baz used Amazon Bedrock AgentCore to address these challenges

Amazon Bedrock AgentCore was the foundation for building an AI code reviewer capable of validating real-world product behavior. Its secure, isolated, serverless browser sessions allow the Spec Reviewer agent to open preview areas, navigate through features, and test UI behavior just as a user would. By integrating the Amazon Bedrock AgentCore runtime to run MCP servers including ticketing systems, the Amazon Bedrock AgentCore Browser tool with lightweight automation modules and context, Baz Reviewer can compare live behavior and code against ticket specifications and designs without requiring any browser infrastructure or custom orchestration. Amazon Bedrock AgentCore's isolation, sandboxing, and visibility help Baz scale multiple MCP servers and allow the agent to securely and reliably perform full-stack authentication at scale.

Enables smart code reviews with Amazon Bedrock

Amazon Bedrock powers the reasoning and decision-making layer behind the Spec Reviewer agent, empowering it to interpret requirements, understand design intent, and evaluate the consistency of behavior seen in the browser. Using base models managed by Amazon Bedrock, an agent can gather the context of a specification, analyze UI scenarios, and draw accurate, actionable conclusions about whether a feature meets expectations. Amazon Bedrock provides the reliability, security, and scalability needed for production-level agent operations, allowing Baz to offload complex interpretation and validation logic to high-performance LLM while keeping browser functionality isolated within AgentCore. This combination allows the reviewer to bridge the gap between what was intended and what was actually built.

The conclusion

Agent Baz Spec Review shows how Amazon Bedrock and Amazon Bedrock AgentCore enable organizations to automate product validation workflows that previously required significant manual effort. Using Amazon Bedrock's foundational models for defining requirements and making decisions, combined with the secure browser capabilities of AgentCore, Baz created a solution that ensures implementation against specifications throughout the development lifecycle, reducing reported bugs by up to 50% and integration time by 30-70%

Customers who adopt Spec Reviewer have seen a significant reduction in manual product validation work, with feature validation starting in the development cycle and happening automatically in pull requests. Teams report faster updates, fewer regressions, and higher confidence that changes meet requirements before integration.