ANI

Why Do LLMs Do Your Documents If You Outsource?

# Corruption through messengers

We are entering a new era of AI, where interactions are revolutionized work delegates. Users don't just chat with AI that answers their questions: they are increasingly delegating horizon tasks – from editing source code to professional text formatting or even managing accounting books. Therefore, they trust AI systems to an unprecedented level to maintain the integrity of files such as documents across multiple interactions.

However, recent research has revealed a problem. When assigning tasks to a major language model (LLM)it may silently corrupt the documents you provided. In order to understand this issue, the scientists in this study, summarizing their findings, created a robust experimental framework called “DELEGATE-52”. This benchmark covers 52 professional domains: from formal documentation to Python coding, music notation, or crystallography.

The authors tested a total of 19 different LLMs using an intelligent simulation method based on the “round-trip” method, asking the AI ​​to perform some planning, followed by direct opposite instructions to undo the planning. In a good case, the model will restore the original document as it was – complete. Practical testing: even the most intelligent models, such as Gemini Pro, Claude Opus, and GPT-5, are able to corrupt 25% of the content of the original document after 20 interactions; weak models can reach 50%.

# Why Models Make Your Documents

Let's analyze several reasons why the previously described phenomenon of structural content decay can occur. Researchers have found several reasons why this happens:

// 1. A combination of errors

As in the traditional “telephone” game, small mistakes made by LLMs can add up quietly and prove significant. A single edit may add some minor, localized errors, but a complex sequence of edits can cause problems over time, causing serious deterioration of the document over time.

// 2. Eliminate Weak Models, Smart Ones Hallucinate

In the study, a dramatic variation in how different types of models fail is highlighted. Weak models often get deleted: discarding content by mistake, which makes the problem visible after several interactions due to the obvious reduction of the document's content as a whole. However, for borderline LLMs, the root of the problem is not deletion but corruption: they maintain the “look and feel” of all documents, even maintaining an almost perfect word count, but quietly mistype, correct, or replace factual information with fictions that still sound reasonable. Here's the irony: the smarter the model, the harder it is to see its corrupt behavior, since the end result still looks reasonable at first.

// 3. Full Content Attachments and Bugs

In a messy environment – with a lot of context information or overly attached documents – models struggle to keep information intact. As the size of the document increases or more “bug files” are included as part of the immediate context, the severity and impact of the deterioration goes up, losing the grip of accurate information and filling in the gaps based on the concept of prediction. The model no longer adheres to the source text, as it finds it easier to just guess it.

// 4. The Importance of Domain Adaptation

One final reason why models tend to degrade documents in complex interactions involving deployment has to do with the nature of the use case and how familiar the model is with it.

Not all files degrade equally in transmission-based operations. According to research, LLMs do well in highly structured, structured domains, such as Python source code. It is when they are pushed into natural language tasks or local formatting that they quickly lose the strong sense of internal understanding needed to keep files completely consistent.

# Does Agent AI help?

Even if LLMs are improved by giving them agent tools – such as the ability to run code or directly read and write files – the problem of document corruption and corruption based on transmission does not go away. In fact, agent additions do nothing to prevent the problem occurring in the core of the transformer under LLMs. Rethinking how long-term AI operations should be guaranteed is necessary. Until then, using LLMs as fully unsupervised editors remains a high-risk gamble.

Iván Palomares Carrascosa is a leader, author, speaker, and consultant in AI, machine learning, deep learning and LLMs. He trains and guides others in using AI in the real world.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button