Machine Learning

Lost in Translation: How AI Exposes the Rift Between Law and Logic

Over the last few years, I have sat in multiple meetings with IT and legal teams where the intent and motivation is visibly misaligned. It often feels like two completely different worlds trying to reach agreement under pressure. As one colleague once described it to me: “Legal writes for humans, IT builds for machines.” Law allows interpretation, context, and mitigation, while IT depends on logic and deterministic workflows. The result is that even small ambiguities can lead to weeks of wasted effort building solutions that were never legally viable in the first place.

The problem, solution and predicted outcomes I outline here is aimed at all three foundations: the business leaders, IT professionals (more specifically in Data & AI) as well as legal professionals trying to navigate the growing disconnect in implementing compliant data solutions. Rather than treating the problem as a communication issue alone, it proposes a practical framework for translating legal intent into machine-readable and architecture-aware controls that can scale with modern data and AI ecosystems.

For years this tension was manageable, but it has been exposed since the introduction of GDPR rules in 2016, and the influx of AI requests from all corners of the business is about to expose the gap at scale.

Business: outcomes over everything

The business prioritises growth, revenue, optimisation and competitive advantage. Their language is KPIs, margins and performance. Compliance, from their perspective, is a constraint to navigate and not a mission in itself. They’re not trying to be reckless, they simply operate in a world measured by results.

They want to:

  • Analyse customer behaviour
  • Test new features
  • Use AI to personalise experiences
  • Extract more value from data

Legal: risk, mitigation and defensibility

Legal does not operate in absolutes. Zero risk is rarely demanded or realistic, they operate in acceptable risk. They do want any risk taken to be defensible, so if they are challenged on something, they can demonstrate they acted responsibly.

They think in terms of:

  • Lawful basis
  • Proportionality
  • Mitigation
  • Demonstrable intent
  • Defensibility in case of investigation

Legislation is written in narrative form and is intentionally principle-based. It leaves room for interpretation and legal professionals are trained to interpret that narrative. They are not trained to design database schemas, configure access controls or define system-level enforcement mechanisms.

IT: Deterministic Controls

IT needs specificity and cannot work in narratives. “Reasonable safeguards” cannot be implemented. When legal says, “It depends,” IT thinks, “I cannot build this.”

IT needs to know:

  • Is this field personal data?
  • Can this dataset be used for model training?
  • What retention period should be enforced?
  • Should this attribute be masked or removed?
  • What exactly constitutes anonymisation here?

So, the business wants to create value that leads to profitability, while Legal’s motivation is to ensure compliance. Meanwhile, IT must support both by building reliable systems that deliver value while remaining compliant. This creates an uneven compliance accountability burden and, as a result, makes discussions and agreements between these departments painstakingly slow.

Image generated by author

How AI widens the gap

Previously this was manageable because the pace of data usage was slower. Manual oversight was still feasible and legal teams could review major initiatives one-by-one. That era will end when the volume and velocity of AI-driven data usage overwhelms traditional compliance models. Data is no longer analysed linearly, it is continuously processed, combined, enriched, repurposed and modelled. Autonomous agents can even trigger workflows, generate insights and make decisions without human review. At such scale, traditional legal oversight breaks down. Legal cannot manually assess every new data use case or “processing”¹ activity and IT do not have the capacity to interpret ambiguous legal clauses every time an engineer builds a new pipeline, but the business will not slow innovation to wait for interpretive debates.

Image generated by author

From legal text to architecture-aware compliance

Legal intent is rarely encoded in a way that systems can validate. Instead, compliance lives in PDFs, policies, meeting minutes and emails. That works to a point, but fails at scale.

What is missing is a shared interface in a structured, machine-readable form, using concepts like structured metadata, policy-as-code and data contracts to add translation layers. So instead of open-to-interpretation discussions, we use AI to validate usage automatically and autonomously.

Machine-readable governance would allow Legal to define the acceptable boundaries, IT to implement enforceable constraints, and the Business to see clearly what is or isn’t permitted. We need to move from theoretical compliance to observable compliance.

The proposal

The Core Idea: Eradicate human error in handoffs between Legal and IT

Unstructured human conversation is a poor mechanism for transferring precise, technical, legally-consequential decisions between people who do not share a common language. The goal is not to eliminate human judgment, but to replace the parts of the process where human interaction adds friction and error rather than value.

I propose a simple organisational principle: replace the unstructured human handoff with a structured, AI-assisted process. The failure point in traditional compliance is not that humans are involved, it is that humans with different motivations are asked to reach precise, technically implementable agreements through open-ended conversation. The solution described here is designed to eliminate that friction systematically by structuring the inputs, automating the checks, and reserve human decision making for the genuinely ambiguous cases, not the routine ones.

Before walking through the process, a few foundational concepts are worth establishing clearly. These do not require technical expertise to grasp, and in fact, consistent use of this terminology across Business, Legal and IT is itself part of the solution.

Data Products, Output Ports and Data Contracts

These terms are popular in Data Mesh, but you don’t need a mesh to benefit from them. Data Mesh is a method to manage data in large organisations by giving ownership of data to the teams that know it best, rather than centralising it in a single department. Each team treats their data as a product, something they are responsible for maintaining and making available to others in a consistent, governed way. Even in a centralised architecture, adopting data products, output ports, and data contracts as a standard vocabulary brings immediate value. It creates a shared language to clearly define what data is, who can access it, through which channels, and for what purpose. Without it, governance conversations between Business, Legal, and IT will continue to break down. The terminology comes first, followed closely by the architecture.

Data Product Model — generated by author
Data Product core concepts — created by author

In practice, the chain could look like this:

  • a data product (“website user behaviour”)
  • exposes data through an output port (set of tables in a database)
  • governed by a data contract (which specifies that the website user behaviour data may be used for enhancement of the products that get sold on the website but not for marketing or customer profiling, must be retained for no more than three years, and requires pseudonymisation before use in any predictive or AI model)

Every layer is explicit. Every layer is enforceable.

What a Data Contract Looks Like

The example below shows a data contract assigned to the Website User Behaviour data product. The output port is a set of Iceberg tables in an S3 bucket, a common pattern in modern data platforms where consumers have freedom of the tool they use to consume. The contract travels with that output port: whoever accesses those tables, for whatever reason, does so under the terms recorded here.

Data Contract details – generated by author

A more detailed standard for data Product contracts can be found here2.

The data contract will contain either reference to or details about the legal purposes and other constraints by which this access layer needs to be governed.

Data Contract in the context of legal terms and documents – created by author

How it Works in Practice

What I have seen working well are organisations that already use Data Product and Data Contract terminology, where these human conversations and agreements are captured as metadata artefacts and then applied throughout data pipelines and downstream consumption. However, it still takes significant time to establish these agreements in the first place, and it is even more difficult to maintain and update them reliably as the organisation evolves.

To resolve these pain points, I am proposing an AI-assisted workflow organised into three phases: PREP, MAP and RUN.

PREP happens once and evolves incrementally. MAP runs for every new data activity or amendment. RUN is the ongoing automated monitoring that operates continuously once the contracts are in place. At no stage does ambiguity get passed silently from one person to another and mistaken for agreement.

Proposed three phase approach – created by author

Phase 1 — PREP

Before any data activity can be governed through this process, IT will play the driver role in creating the framework for this process: The data product catalogue, the output port standard, the data contract template, and the LLM-assisted interfaces used in the MAP phase. This is the foundation on which everything else depends.

Now, as I say that, I can also confidently say the following: If there is one thing I have learned from working in and with large organisations, it is that there is absolutely no point in waiting for every component to be perfectly in place before starting. In practice, that moment never arrives. Most organisations already have pieces of the puzzle, even if they are fragmented or immature. A large enterprise already running some form of Data Mesh might unknowingly be 80% of the way there, while a smaller startup may still be building its foundations from scratch. The reality is that these transformations are almost always iterative, messy, and happening while the business continues to move.

Start with what you have to build out a proper data catalogue, focusing on the low-hanging fruit: high-value data sets that get frequently requested. For “data product” (in quotes because that might not be what you call it and defining your disparate set of tables might be your first big hurdle), capture only the information that matter most at this stage: the name of the data product, the team that owns it, how it is shared, and a one-line description of what it contains. Everything else — detailed field-level tagging, data quality metrics, lineage graphs — can be added iteratively as the catalogue matures and the organisation builds confidence in the process. A catalogue with five well-understood data products is more valuable than a catalogue with two hundred partially documented ones.

Critically, PREP cannot be approached as a big bang. A programme that attempts to catalogue every data product, define every output port and contract every dataset before anything goes live will stall before it starts. The effort is too large, stakeholder appetite will wane, and the business will move on without the framework.

The LLM interfaces themselves follow the same incremental principle. A working MVP might be a running Claude with a connector to your data catalogue using an pre-built MCP server (I use Entropy Data as it is very easy to setup for exploratory purposes).

Output: Not a finished framework, but a working MVP focused on one or a few high value data sets.

Phase 2 — MAP

The MAP phase runs every time the business proposes a new data processing activity, or an existing data activity changes, a new AI model, a new pipeline, a new access request, an amendment to an existing contract. It is the structured replacement for the open-ended compliance conversation. Three steps, three owners, one clean handoff each time, with the LLM ensuring that nothing ambiguous passes between them unexamined.

Step 1 — Business initiates a change or new processing activity

When the business proposes anything that involves data, a new AI feature or an additional source to an existing pipeline for example, this process is triggered. This is NOT a meeting. It is a structured conversation with an LLM configured to capture the nature of the change or new processing activity and translate it into the information needed to engage the formal MAP process.

Step 1 – Business (by author)

The LLM asks direct questions, based on the framework that IT has set up. Depending on the answers from business, the follow-up questions are tailored based on the existing metadata, company knowledge documents and whatever the LLM has been set up to access.

If the business says something vague like “We want to use customer data differently,” the LLM will drive the conversation into the right structured approach and will check whether the request falls within the purposes already recorded on the relevant output port’s contract, or whether it constitutes a new processing activity requiring a new or amended contract.

The result is a structured description of the change or new activity: the data product, the output port, the declared purpose and any amendments required.

Technically, underlying this could be the YAML files with change tracking in GIT, but the LLM could present it in the form of a PDF or document on shared wiki.

Output: A structured description of the change or new processing activity. The data product, output port, declared purpose, scope of change together with any unanswered questions flagged explicitly.

Step 2 — IT receives amended contracts

IT receives the structured description from Step 1 — whether that is a proposed new contract or an amendment to an existing one.

Step 2 – IT (by author)

IT can accept the amendments as they are (as it would have already been curated at this point) or they can choose to interrogate them further.

IT resolves what it can and passes only the genuine ambiguities forward. Questions sent to Legal are pre-scoped and specific. They are framed as precise binary choices, not open-ended narrative. This is what makes Step 03 efficient.

Output: A new draft contract or marked-up amendment, technically validated, with a short list of specific scoped questions for Legal — each framed to invite a deterministic answer, not a prose opinion.

Step 3 — Legal reviews, decides and signs off

Legal receives a near-complete data contract with a short list of specific questions that require legal judgment. They interact with an LLM loaded with the relevant regulatory context, whether it be GDPR, the AI Act, sector-specific regulations or the organisation’s own policies. The LLM guides them through each question systematically.

Step 3 – Legal (by author)

This is the critical design point: the LLM is configured to push Legal toward deterministic answers. Not “It depends” or “We would need to assess,” but “Yes, this purpose is compatible with the declared lawful basis on this output port,” “No, this output port may not be used for model training without explicit consent,” or “Permitted under these specific conditions, which must be recorded in the data contract.” Wherever Legal exercises judgment, that judgment is captured as a structured decision, not a paragraph of prose that IT must interpret.

Legal owns what is signed off. The data contract is updated to reflect their decisions. If the access request is approved, the consuming data product is added to the catalogue and the contract governs its use automatically. If challenged later, the record shows exactly what was assessed, by whom, on what date and on what basis. There is no ambiguity about accountability.

A few of the legal professionals I have spoken to over the last couple of years have already started recognising this shift themselves. In my view, the organisations that will navigate it best are the ones where legal teams become more architecturally aware and technically literate. The more they understand systems, data flows, and implementation realities, the more likely they are to genuinely own and validate the decisions being made. That is where the idea of a “human in the loop” starts becoming meaningful rather than symbolic. If legal teams remain purely advisory while increasingly relying on LLM-generated interpretations, they may end up implicitly trusting outputs they cannot fully validate themselves. Ironically, that dependency could itself become a compliance risk.

Output: A finalised, legally signed-off data contract attached to the relevant output port — ready for IT to implement directly, with no further translation required.

Phase 3 — RUN

Once data contracts are in place on output ports, the RUN phase operates continuously and automatically. There are no meetings, no manual audits and no trigger required from Business, Legal or IT.

In a study “Automating Data Governance with Generative AI”3 the use case checked 110 data access requests against privacy policies in real time. It caught every issue a human expert flagged, plus 3.6 times more warnings — 80% of which experts later confirmed as valid. Humans retain the final call on every flag. This is not replacing governance; it is making governance operate at the speed AI-driven data usage demands.

In practice, the RUN phase means the system is continuously doing what no team of humans could do manually at scale:

  • Checking every new data access request against existing contracts on output ports automatically at the moment the request is submitted, before any human reviews it.
  • Monitoring the live data landscape for emerging violations: when a new restriction is introduced, the system detects which already-approved contracts may be affected and surfaces them for review.
  • Answering governance queries in real time. Legal teams can ask conversationally: “Which data products currently have no documented lawful basis?” or “Which output ports are being accessed for purposes not listed in their contracts?” The system answers instantly — no audit, no email chain.
  • Flagging policy drift — when a regulation changes or a new internal policy is introduced, the system re-runs it against the entire contract landscape to identify what needs review.

As with the other steps, start small and scale up. Start by only evaluating new data requests. Then you could enhance this process to periodically and randomly checking query logs to check whether it still fits. Or you could choose to trigger quarterly audits. Whatever is required within your organisation.

Let’s be pragmatic: don’t expect a clean starting point. What the RUN phase will initially reveal is the governance debt that has been accumulating quietly for years. That is fine. Visible problems are solvable ones.

The Outcome

The Business Value of Getting this Right

The value of real automated and observable governance is hard to see if you are not aware of the tsunami that is coming over the horizon. Multiple vendors are already rolling out managed AI services.

What has surprised me most over the last year is how quickly non-technical people are beginning to build their own small networks of AI agents and automations. Today many of these are still prototypes or side projects, but it is obvious that this will soon spill over into real business processes at scale. I have personally seen incredibly impressive solutions built by people with little formal technical background, yet often with very limited awareness of the legal implications, governance requirements, or compliance prerequisites that come with them.

The study AI agents under EU law: A compliance architecture for AI providers4 indicates that the legislation framework is not currently suitable for real, complex AI systems. Many in the industry predict that the legislation will not keep up, which means legal teams will need to be enabled to make risk-lowering decisions at scale.

In the process that I put forward, because Legal’s decisions are recorded as structured sign-offs on data contracts rather than prose opinions, legal accountability is clear, defensible and unambiguous. If a question arises about why a particular access request was approved, the record shows exactly what was assessed, by whom, on what date, and under what conditions. This is the difference between saying you acted responsibly and being able to demonstrate it.

My one concern with this approach

There is one future scenario that has me concerned (apart from us all becoming like the humans in Wall-E). What if my suggested plan goes too well. It becomes such a polished process that we become over-confident.

At first this feels like progress: frustrating and hard to understand human interactions become clear structured handoffs which are easy to query. But because the hand-offs are so seamless, confidence increases faster than comprehension.

Humans start reviewing the shape of the output rather than the substance. “It looks great” rather than “Yes, it is interpreting the request correctly”. Over time, the “human in the loop” becomes thinner. At first we review, then we only approve, then we start asking another LLM to check whether the first LLM missed anything. Eventually, the organisation is not really transferring knowledge between people anymore; it is transferring plausible summaries between systems. The danger is not that the AI makes one obvious mistake. The danger is that responsibility becomes distributed across a chain of beautifully formatted outputs that nobody fully owns, understands, or has time to interrogate. When something fails, everyone can point to a hand-off, a review, a summary, or an approval step. But the original context has evaporated. The loop still contains humans, technically. It just no longer contains human judgement at the point where it matters.

What if we do nothing?

As Jean-Paul Sartre said: “Once we know and are aware, we are responsible for our action and our inaction.” Failing to shift is not neutral; it’s a regressive choice. The AI Act’s phased enforcement means organisations that haven’t operationalised literacy by then are already non‑compliant. The compliance window is closing while AI velocity keeps doubling.

In practice, every month of delay compounds the debt. The longer legal advice remains trapped in unstructured documents, informal interpretations, and human-only review cycles, the closer it moves from being merely inefficient to being indistinguishable from non-compliance. And as regulatory expectations shift toward demonstrable literacy, traceability, and accountability, “we didn’t know” will no longer be a credible defence.

If I had a crystal ball

If I were to look into my crystal ball I would predict some manifestation of the Gartner hype cycle “Plateau of productivity” playing out.

I believe around 2027/2028 there will be considerable pushback from regulators if we cannot reel back the control. We will have quite a few real-life horror stories in the news about Agentic AI gone bad in the next 2 years. The pushback will be met by companies and consulting firms that promise better governed implementations (like the one that I propose) and then for years we will have this delicate dance between the regulators and companies that are pushing the boundaries.

In the end we will survive and most likely it will be less dramatic than some (like myself) envisage.

[1] ‘Processing’ as defined in Article 4(2) of the General Data Protection Regulation (EU) 2016/679 (GDPR).
[2] Bitol. (2025). Open Data Contract Standard (ODCS) (v3.1.0). LF AI & Data Foundation. https://bitol-io.github.io/open-data-contract-standard
[3] Dietz, L. W., Wider, A., & Harrer, S. (2025). Automating Data Governance with Generative AI. AAAI/ACM Conference on AI, Ethics, and Society. Available at: Automating Data Governance with Generative AI
[4] Nannini, L., Smith, A. L., Maggini, M. J., Panai, E., Feliciano, S., Tiulkanov, A., Maran, E., Gealy, J., & Bisconti, P. (2026). AI agents under EU law: A compliance architecture for AI providers. arXiv.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button