ANI

How Transformers think: The flow of information that made language models work

How Transformers think: The flow of information that made language models work
Image editor

The obvious Getting started

Thanks to major models of language (Lllms), we these days have lovely, very useful applications Gemini, The chatgtagain It's a scarto name a few. However, few people realize that the structures under LLM are called a transformer. This work was carefully designed to “Think” – that is, to process the data that describes the language of people – in a specific and specific way. Are you interested in gaining a broader understanding of what goes on inside these so-called transformers?

This article explains, using a gentle, understandable, and non-technical tone, how the Transmormer models sit behind the LLMS Amiting such as the LLMS Details such as the USS Protudents and that, more to the token, the Token token the next day).

The obvious First Steps: Making Language Machine-Intelligible

The first key concept to understand is AI models do not truly understand human language; They understand and work with numbers, and the converters behind LLMS are no different. Therefore, it is necessary to convert human language – i.e. text – into a form that the transformer can fully understand before it can process it in depth.

Put another way, the first few steps that take place before entering the main mechanical parts and inside the Transformer are mainly focused on turning this mature document into important representations that keep the important structures under the hood. Let's examine these three steps.

Making language understandable by machinesMaking language understandable by machines
Machine intelligible language (click to enlarge)

// Distribution

The Tokenizer is the first actor to enter the scene, working in tandem with the Transformer Model, and is responsible for breaking the mature text into smaller pieces called tokens. Depending on the tokenizer used, these tokens can be equivalent to words in most cases, but tokens are also sometimes parts of words or symbols. In addition, each token in the language has a unique numeric identifier. This is when the text is no longer text, but has numbers: all that is possible with tokens, as shown in this example where a simple token converts a text with five words, each word:

Tokenization of text into tokensTokenization of text into tokens
Tokenization of text into tokens

// Embedded tokens

Next, all Token IDs are transformed into a (d)-veinsional vevector, which is an array of numbers of size (D). This full representation of the token as the embedding is the same as the definition of the full meaning of this token, be it a word, a part of it, or a token. The magic lies in the fact that the tokens correspond to the same concepts of meanings, such as the queen and press onyou will have the same installation results.

// Domain installation

Until now, embedding tokens contained information about the state of a set of numbers, however that information was still associated with a single isolation token. However, in “episode language” as a text sequence, it is important not only to know the words or tokens they contain, but also their position in the text they belong to. The entry of the domain of the process, through the use of statistical functions, is added to each token to embed additional information about its position in the sequence of the original text.

The obvious Transformation in the context of the transformer model

Now that the numerical representation of each Token's value includes information about its position in the text sequence, it's time to add the first layer of the main body of the transformer model. Transformers are very deep structures, with many integrated elements multiplying throughout the system. There are two types of transformer layers – the encoder layer and the decoder layer – but for the sake of simplicity, we will not make an unreasonable distinction between them in this article. Just notice now that there are two types of layers in a transformer, although both are very similar.

Transformation in the context of the transformer modelTransformation in the context of the transformer model
Transformation with Transformer model context (click to enlarge)

// Multi-headed attention

This is the first major subprocess that occurs within the transformer layer, and perhaps the most sensitive and unique feature of transformer models compared to other types of AI systems. Multi-headed attention is a mechanism that allows a token to notice or “pay attention” to other tokens in sequence. It collects and includes useful information about the content in its complete representation, i.e. linguistic features such as grammatical relationships, long dependencies between words not close to each other in the text, or semantic similarity. Value, thanks to this method, various aspects of compatibility and relationship between parts of the original text are successfully captured. After the representation of the token goes to this part, it ends up getting a rich request, which is more about the context itself

Some transformer structures are designed for specific tasks, such as translating text from one language to another, analyze it in this way of interdependence between tokens, looking at the input text and the output (translated) produced so far:

A multi-headed attention to translators who do not translateA multi-headed attention to translators who do not translate
A multi-headed attention to translators who do not translate

// Neural Network Sublayer

In simple words, after passing through attention, the second common stage within every iterative layer of the Transformer is a set of neural network fragments that are decorated and help to learn the alternatives of our token representations. This process is like continuing to search these representations, identifying, and reinforcing relevant features and patterns. Finally, these layers are a method used to gradually learn a little, a little more understanding of the entire text being processed.

The process of paying attention to the multi-headed and sulll-forward sublayers is repeated many times in that order: as many times as the number of repeated transformer layers we have.

// Final Destination: Predicting the next word

After repeating the previous two steps in the same way many times, the representations of the tokens that came from the original text should have allowed the model to get a very deep understanding, making it possible to see the big and subtle relationships. At this point, we come to the last part of the Transformer Stack: a special layer that transforms the final presentation into a power for all the tokens. That is, we calculate – Based on all the information learned along the way – the probability of each word in the target language being the next word the transformer model (or LLM) should output. The model finally selects the highest token or word as the next one it generates as part of the end user output. The entire process repeats for all words to be generated as part of the model's response.

The obvious Wrapping up

This article provides a gentle and conceptual tour of the journey through which text-based information flows through creating a Signature model behind LLMS: The Transformer. After reading this, hopefully you have gained a better understanding of what goes on in the internal models like those behind chatgpt.

Ván Palomares Carrascosa is a leader, writer, and consultant in AI, machine learning, deep learning and llms. He trains and guides others in integrating AI into the real world.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button