Models Proving Their Accuracy

nimda February 17, 2026

0 3 1 minute read

How can we trust the accuracy of a learned model for a particular object of interest? The accuracy of a model is usually measured by averaging over the input distribution, which does not guarantee any constant input. This paper proposes a theoretically based solution to this problem: training Self-Verification models that prove their output correct in a V-verification algorithm using Proof-of-Work. Self-validating models satisfy that, with high probability over inputs drawn from a given distribution, the model produces the correct one and successfully proves its correctness in V . The noise structure of V ensures that, for all inputs, no model can convince V of the wrong output accuracy. Thus, the Self-Confirmation model proves the correctness of most of its results, while all wrong results (of any model) are found by V. We propose and analyze two general methods of learning Self-Verification models: Transcript Learning (TL) that depends on accessing transcripts to receive interactions, and Reinforcement Learning from Verifiers trains with the Vertier trained verifier.