A Geometric Method for Detecting Fraud Without an LLM Jury

of flying birds.
There is no leader. There is no central command. Each bird aligns with its neighbors—aligning direction, adjusting speed, maintaining coherence through spatial interaction. The result is global order from local cooperation.
Now imagine one bird flying as confidently as the others. Its wings are confident. Its speed is good. But its approach is not the same as its neighbors. It is a red bird.
It is not lost. There is no doubt. It's not just for the herd.
Hallucinations in LLMs are red birds.
The problem we are actually trying to solve
LLMs produce fluent, confident writing that can contain creative information. They are creating illegal cases. They cite papers that have never been written. They state facts in the same tone whether those facts are true or completely fabricated.
A common way to find this is to ask another language model to examine the output. LLM-as a judge. You can see the problem right away: we are using a hallucinating system to detect illusions. It's like asking someone who can't distinguish colors to sort out paint samples. They will give you the answer. It might be okay sometimes. But they don't really see what they need to see.
The question we asked was different: can we see hallucinations in the geometric structure of the text itself, without needing the perspective of another language model?
What embedding actually does
Before we get to the discovery process, I want to go back and find out what we're working on.
When you enter text into a sentence editor, you get a vector—a point in a high-dimensional space. Documents such as the adjacent world. Unrelated documents are spaced apart. This is what different training prepares. But there is a more subtle statement than “similar things are close.”
Think about what happens when you embed a question and its answer. The question comes somewhere in this embedding space. The answer is found elsewhere. The vector that connects them—what we call the migration—he points in a certain direction. We have a vector: a magnitude and an angle.
We noticed that too for ground-based responses within a given domain, these transport vectors point in constant directions. We found something in common: angles.
If you ask the same five questions and get five well-grounded answers, the deviation from question to answer will be almost the same. It's not the same—the size varies, the vertical angles vary slightly—but the overall direction is consistent.
When the model dreams, something different happens. The answer is still there everywhere in the embedding area. It still speaks well. It still sounds like an answer. But displacement does not follow a spatial pattern. It points somewhere else. A vector with a completely different angle.
The red bird flies confidently. But not with the herd. Flying in the opposite direction has a completely different angle than other birds.
Displacement Consistency (DC)
We formalize this as Displacement Consistency (DC). The idea is simple:
- Build a reference set of question-answer pairs based on your domain
- To find a new pair of question answers, find neighboring questions in the reference set
- Calculate the mean departure distance of those neighbors
- Measure how well the new migration fits that exact path
The basic answers are pretty consistent. Hallucinated responses do not. That's all. One cosine similarity. No source documents are required during forecasting. There are not many generations. No internals are models.
And it works remarkably well. Across all five different embedded models, across multiple prediction benchmarks including HaluEval and TruthfulQA, DC achieves nearly identical discrimination. The distribution is slightly overlapping.
Catch: domain location
We tested the DC across five embedding models chosen for structural diversity: MPNet-based contraversive fine-tuning (all-mpnet-base-v2), weakly supervised prior training (E5-large-v2), instruction-activated training with strong negatives (BGE-large-en-v1.5), encoder-decoder-decoder-decoder-decoder-decoder-decoder-decoder-decoder-decoder architectures (nomic-embed-text-v1.5). If DC only works with one build, it may be an artifact of that particular model. Consistent results across different structural models would suggest that structure is important.
The results were consistent. The DC achieved an AUROC of 1.0 for all five models in our synthetic benchmark. But performance ratings can be misleading—perhaps domain-specific responses are too easy to identify.
We therefore validated on established falsification datasets: HaluEval-QA, which contains LLM-generated hypotheses specifically designed to be falsifiable; HaluEval-Dialogue, with answers that deviate from the context of the dialogue; and TruthfulQA, which tests common misconceptions that people tend to believe.
DC has maintained complete discrimination against all. Egg degradation from synthetic to real benchmarks.
For comparison, rate-based methods that measure where answers land relative to questions (rather than where they go) found an AUROC of around 0.70–0.81. The gap—about 0.20 absolute AUROC—is large and consistent across all models tested.
The score distribution tells the story visually. Ground-based responses converge strongly at high DC values (about 0.9). The indicated responses are spread with low values (about 0.3). The distribution is slightly overlapping.
DC achieves full recovery inside a small background. But when you try to use a reference set from one domain to find outliers from another domain, the performance drops to random—AUROC around 0.50. This tells us something important about how embeddedness involves being grounded. It is like seeing different flocks in the sky: each flock will have a different path.
For LLMs, the easiest way to understand this is by using the image of what in geometry is called a “fiber bundle”.
The surface in Figure 1 is a base quantity that represents all possible queries. At each point in this area, there is a thread: a line that indicates where the basic answers go. In any local region (single domain), all threads point in the same direction. That's why DC works so well locally.
But all over the world, in different regions, the threads point in different directions. “Based guidance” for legal questions is different from “based guidance” for medical questions. There is no single universal pattern. Local compatibility only.
Now watch the following video. Bird routes connecting Europe and Africa. We can see fiber bundles. Different birds (small/large, small, insects) have different directions.
In differential geometry, this structure is called local minimum without global minimum. Each piece of the manifold looks simple and harmonious inside. But the patches cannot be integrated into a single global coordinate system.
This has a noticeable effect:
grounding is not a property of universal geometry
There is no single “true guide” to embedding space. Each domain—each type of work, each LLM—creates its own migration pattern during training. Patterns are real and visible, but they are domain specific. Birds do not migrate in one direction.
What does this literally mean?
For use, finding a domain means you need a small measurement set (about 100 examples) that fits your specific use case. A formal Q&A program requires formal examples. A medical chatbot needs medical examples. This is a one-time upfront cost—the measurement takes place offline—but it cannot be skipped.
To understand embeddedness, the findings suggest that these models provide a richer framework than we think. They don't just learn to “like.” They study a domain-specific map where your disorder reliably reflects dreaming.
The red bird does not d
The forgotten answer does not have the mark “I was created.” It's slippery. It is confident. It looks exactly like the answer based on every top level metric.
But it does not move with the herd. And now we can measure that.
Geometry has always been there, specific to how different training shapes us and embeds space. We are just learning to read it.
Notes:
You can find the full paper at
If you have any questions about the topics discussed, feel free to contact me at [email protected]


