Generative AI

This AI Paper introduces Kolmogorov – Compression-As-Intelligence Benchmark testing Language Models

Compression is the computational process, focused on the imagination of the Kolmogorov, which describes a small system needed to produce the order given. Unlike traditional reduction methodology and multiplying, Kolmogorov's framework indicates pressure as a problem for finding organized patterns through programming. While the idea promises right pressure, it is a lack of an important problem. However, the appearance of large languages ​​of languages ​​is the opportunity to evaluate how today's systems can rate the correct Code instead of matching the pattern instead of matching the pattern rather than the code.

The basic issue is from the ends of the tools available in the oppression of the data sequence using COFFIs, a used code. Models often repeat the replacement of the programs that breed, which shows the gap in understanding the true pattern. This is especially evident when serving loudly, text, or DNA sequence, where there should be complicated structures to achieve practical stress. The major challenge ensures that the model repeats the model and uses a small and logical set. In addition, despite the use of a controlled test, usually fails to support generalization of the natural data, important to access practical applications.

Many pressure tools are available, from traditional algorithms such as GZIP to the new press. GZIP remains part of a solid part, especially for a long-term or repeated order, due to effective statistical evidence. Recently, the measuring language measures are combined with writing arithmetic codes, using the opportunities for displaying to step input data. However, these methods usually require full weight access during the design, reduce their performance and functionality. Models produced by coded codes such as GPT-4 and Llama tested with Zero-Shot settings to produce Python systems that display installation sequence. However, they often produce a long, unimposed, limited success, especially when dealing with invisible or complex order.

Meta Ai Ai Netel Aviv University bring kolmogorov-test (KT), a benchmark of the ability to consult languages. Average analyzing model to produce a very short system that comes out in the order order given. Unlike normal benchmarks, KT emphasizes the logical structure and the program generation over the imitation of speculation. The following includes natural data from Audio (Librispectech), Text (Wikipedia Enwill), DNA (GRCHK38), and sequence of the language designated (DSL). The DSL supports the form of formal sequence by writing tasks such as the Range Creation, repairs, integrating and filters.

Investigators cultivate a default framework for producing a sequence programs for the order program using the DSL. These programs and train and evaluate models, including highly trained trainees and special trainees such as seqcoder. Rate work, the Mthestic Metrics such as accuracy – no matter how produced breeding sequences – accuracy – clarity – a small program compared to GZIP Compression. The examination involved the oppressive sequence of various lengths, in a row that approximately 76 byque and the original sequence collapsed in 128.

The results show that even the most powerful models struggle. GPT-4 received 69.5% accuracy in high-quality audio but decreased to 36.4% 8-bit Audio Audio and 50.3% of DNA data. Llama-3.1-405B is bad, with low accuracy as 3.9% of Audio and is only 24.8% with DNA. In the Multiple book, Seqcoder-8B has achieved 92.5% accuracy with the accurate number of 0.56, traditional tools that go as a GZIP. However, its accuracy of the actual information of the world remain closer to zero. This contrasts of the effectiveness of success by the benches made of various good and noisy, highlighting the limitations of current training regulations and issuing the need for new strategies.

Altogether, the study clearly shows the difficulties of oppression by the generation of the code. Kt Benchmark provides model modeling model of model and variety to test the recognition, disclosing strong separation between synthetic studies and international claims. The laundry and testing sets the higher future models aiming to combine the contemption, but the new new is still required to achieve this challenge.


Survey the paper. All credit for this study goes to research for this project. Also, feel free to follow it Sane and don't forget to join ours 85k + ml subreddit.


Nikhil is a student of students in MarktechPost. Pursuing integrated graduates combined in the Indian Institute of Technology, Kharagpur. Nikhl is a UI / ML enthusiasm that searches for applications such as biomoutomostoments and biomedical science. After a solid in the Material Science, he examines new development and developing opportunities to contribute.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button