Reactive Machines

Organize financial documents using Amazon Bedrock Data Automation

Financial institutions process thousands of documents every day, including tax forms, loan statements, and purchase orders. Each has a different format, layout, and field names, making it challenging to create an automated workflow using optical character recognition (OCR) software. Amazon Bedrock Data Automation (BDA) helps solve these challenges by automating the extraction, validation, and analysis of data from financial documents. BDA goes beyond simple OCR by using basic models that:

  • Understand the context of the document
  • See the relationship between the different categories
  • Extract structured, actionable data
  • Verify information across multiple sources

While basic models like Anthropic Claude can extract content from PDFs, Amazon Bedrock Data Automation offers custom citations with industry-leading accuracy at low cost, as well as features such as visual overlays with confidence scores for interpretation and built-in perspective reduction.

In this post, we explore how Amazon Bedrock Data Automation can accurately extract information from four common types of financial documents: bank statements, W-2 forms, 1099-B tax forms, and vendor contracts. We highlight complexity in the documentation, specify custom extractions created in Amazon Bedrock Data Automation, and describe the results of the extraction process.

Solution overview

Amazon Bedrock Data Automation allows you to configure your output based on your processing needs using blueprints. A blueprint in Amazon Bedrock Data Automation is a configuration template that defines how data should be extracted from documents. It specifies:

  • The document type is being processed
  • Data fields to be extracted
  • Validation rules for extracted data
  • Output structure and format

Think of it as a map that tells Amazon Bedrock Data Automation what information to look for and how to process it. If you use a release plan, you can use a catalog plan or a custom-created plan. A custom plan allows organizations to create rollout patterns for their specific needs. In this post, we created custom blueprints and used the BDA console to generate and validate the output.

How to create plans for 4 types of financial documents

The following sections walk you through creating custom layouts for bank statements, W-2 forms, 1099-B forms, and vendor contracts.

What is required

If you're not familiar with how custom plans are made, follow the instructions from the Amazon Bedrock documentation. To test, we uploaded the documents to the BDA console, edited the commands generated by the AI, and downloaded the results. In general, a single custom plan is sufficient for a particular type of document when extracting fixed fields. However, if workflow requirements differ or document formats change significantly, multiple custom blueprints may need to be created to accommodate these differences. After a diagram is created, you can use it as part of a parallel processing workflow. For the same plan, if the input document has different data, BDA may return a slightly different output (for example, some bank statements may have total debits and credits). However, because the output of BDA is formatted in JSON, it is straightforward to create the right rules based on processing the workflow (for example, discard the total amount if the workflow is to separate each debit and credit transaction in accounting).

The following screenshot shows the blueprint information for one of the document types.

Quick Blueprint configuration in the Amazon Bedrock Data Automation console

The following section describes the four documents attempted as part of this project and the deliverables achieved using custom plans based on requirements. Output is available in JSON, CSV, and raw data formats, highlighting the solution's flexibility for various integrations and reporting needs.

Types of financial documents and custom plans

Amazon Bedrock Data Automation offers built-in templates for common document types including bank statements and W-2 forms. These built-in programs provide complete output out of the box. In this post, we use custom blueprints to show how organizations can tailor the rollout to their specific workflow needs. For example, you can extract only transaction data from automatic calculation bank statements, or group W-2 fields into logical structures (federal tax, state tax, code-value pairs) that match the tax processing systems that are descending. Custom plans also work as a way for document types that don't have built-in plans, such as the 1099-B forms and vendor contracts shown later in this post.

1. Bank Statements – Documents from banks detailing the account's financial activity, including deposits, withdrawals, and payments, over a period of time, usually a month.

Bank statements present a difficult challenge: they contain many monthly transactions, often taking up many pages, with various formats and details. In many workflows, a key task is to accurately capture transaction data, including dates, amounts, descriptions, and reference numbers, which can feed directly into automated accounting workflows such as categorizing transactions in a ledger. This automatic release reduces manual data entry errors and streamlines the reconciliation process. As part of our testing process, we selected the following bank statement to test the withdrawal process:

A sample bank statement used to assess the discharge

The account statement was generated using the Amazon Nova Pro Foundational Model

Instructions for the Amazon Bedrock Data Automation blueprint:

Create a transaction log blueprint with the following structure:

Main Field:
- Transactions: [TRANSACTION_DETAILS]

Custom Type:
1. TRANSACTION_DETAILS type containing:
   - Date
   - Description
   - Debit: number
   - Credit: number

Output results to table.csv:

Output results showing transaction data in CSV format

When we update, we can make sure that the system has released the jobs correctly.

2. Form W-2 – Reports income and tax withheld for a person or business.

W-2 tax forms present unique filing challenges because of their standard but complex structure. As part of our testing process, we used the following W-2 to test the discharge process:

A sample W-2 form used for discharge testing

W2 was created using the Amazon Nova Pro Foundational Model

Instructions for the Amazon Bedrock Data Automation blueprint:

Create a detailed W2 form blueprint with the following structure:

Main Fields:
- employer_info: EmployerInfo
- employee_general_info: EmployeeInfo
- federal_tax_info: FederalTaxInfo
- federal_wage_info: FederalWageInfo
- filing_info: FilingInfo
- state_taxes_table: [StateTaxInfo]
- codes: [CodeAmount]
- nonqualified_plans_income: number
- other

Custom Types:
1. EmployerInfo type containing:
   - ein
   - employer_name
   - employer_address
   - employer_zip_code: number
   - control_number

2. EmployeeInfo type containing:
   - ssn
   - first_name
   - employee_last_name
   - employee_name_suffix
   - employee_address
   - employee_zip_code: number

3. FederalWageInfo type containing:
   - wages_tips_other_compensation: number
   - social_security_wages: number
   - medicare_wages_tips: number
   - social_security_tips: number

4. FederalTaxInfo type containing:
   - federal_income_tax: number
   - social_security_tax: number
   - medicare_tax: number
   - allocated_tips: number

5. StateTaxInfo type containing:
   - state_name
   - employer_state_id_number: number
   - state_wages_and_tips: number
   - state_income_tax: number
   - local_wages_tips: number
   - local_income_tax: number
   - locality_name

6. CodeAmount type containing:
   - code
   - amount: number

7. FilingInfo type containing:
   - omb_number
   - verification_code

Output results in result.json:

W-2 output results showing employer and employee information in JSON format

W-2 output results showing tax information and code information in JSON format

When we update, we can make sure that the system has released the jobs correctly. Several complexities of the release were specially verified for the project:

  • There is no special group in the Federal Tax form and the Federal Tax information but they need to be processed together so that the output results should include them.
  • In one 12-box W2 there can be up to 26 codes to report certain compensation amounts and benefit amounts. It is important to extract the code and value as a pair.
  • Employers can put almost anything in box 14. It's useful to catch items that don't have their own dedicated box on the W-2, so these should be filed separately.

3. IRS Form 1099-B: Income from Broker and Barter Exchange Transactions – This tax document tracks:

  • Securities trading activity
  • Transactions made by the seller
  • Participation in the exchange

As part of our testing process, we used the following 1099-B to test the withdrawal process:

Sample 1099-B form used for tax exemption

The 1099-B statement is generated using the Amazon Nova Pro Foundational Model

Instructions for the Amazon Bedrock Data Automation blueprint:

Create a financial transaction blueprint with the following structure:

TRANSACTION_DETAILS type containing:
- security_description
- quantity_sold: number
- date_acquired
- date_sold_or_disposed
- proceeds: number
- cost_or_other_basis: number
- gainloss_amount: number
- additional_information

Output results to table.csv:

1099-B output results showing transaction details in CSV format

An important validation of BDA's contextual understanding capabilities is that the system correctly identified and issued 'TSLA' as the security descriptor for all stock transactions, even if it appeared as a generic transaction descriptor. This consistent output demonstrates BDA's ability to maintain content accuracy throughout document processing.

4. Seller's contract – This withdrawal process is used in a variety of vendor contracts. The specific details to be captured need to be tailored to each company's unique workflow and needs.

As part of our testing process, we selected the following vendor contract to test the release process:

Sample vendor contract page 1

Sample vendor contract page 2

Sample vendor contract page 3

Sample vendor contract page 4

Instructions for the Amazon Bedrock Data Automation blueprint:

Create an agreement blueprint with the following structure:

Main Fields:
- PARTICIPANT_DETAILS: PARTICIPANT_DETAILS
- effective_date
- time_period
- participant_requirements: PARTICIPANT_REQUIREMENTS
- confidentiality_obligations
- TERM_AND_TERMINATION: TERM_AND_TERMINATION

Custom Types:
1. PARTICIPANT_DETAILS type containing:
   - participant_name
   - participant_authorized_representative

2. PARTICIPANT_REQUIREMENTS type containing:
   - assigned_resources
   - participant_obligations
   - participant_restrictions

3. TERM_AND_TERMINATION type containing:
   - term
   - termination_conditions

Output results in result.json:

Results of vendor contract release in JSON format

The system successfully identified and extracted the specified design elements present within the contract.

The conclusion

In this post, we show you how you can use Amazon Bedrock Data Automation to accurately extract important information from financial documents including bank statements, W-2 forms, 1099-B forms, and vendor contracts to automate downstream processing. You learned that:

  • Create custom layouts for different types of documents
  • Extract structured data from complex financial documents
  • Validate Amazon Bedrock Data Automation output for downstream processing

To learn more about using document processing with Amazon Bedrock, review the Amazon Bedrock Data Automation documentation. For production workflows that involve sensitive information, follow your organization's cybersecurity legal guidelines to ensure compliance with all applicable laws, including but not limited to Europe's GDPR and any other regional or industry-specific requirements.


About the writers

Shivanshu Upadhyay

Shivanshu Upadhyay

Shivanshu is a Principal Solutions Architect in the AWS Industries group. In this role, you help AWS's most advanced adopters transform their industry by effectively leveraging data and AI.

Hey Shah

Hey Shah

Ayu is Sr. Solutions Architect at Amazon Web Services (AWS). He helps digital native customers design and implement productive AI and machine learning (ML) solutions on AWS. Ayu is an architect who enjoys helping customers achieve their business goals and solve complex challenges using AWS services and best practices. He also brings a lot of expertise in communication and security.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button