7 popular llms are described in 7 minutes


Photo by writer | Kanele
We use large models of language in our daily activities. These types are trained for billions of online documents and various information, to understand, understanding, understanding, and responding to the same language. However, not all llm is created in the same way. While the basic idea remains like, They vary from their underlying buildings and this variable has a major impact on their skills. For example, as reflected in the other side of various benches, Deepseek Exceres in consultation activities, Claude is doing well, and ChatGPT is prominent in creative writing.
In this article, I will travel to 7 famous LLM buildings to give you a clear view, every minute of just many minutes. So, let's get started.
1. Bert
SPECIAL LINES: https://arxiv.org/pdf/110.04805
Developed by Google in 2018, Bert observed a major change in the understanding language with appreciation for the deepest attention of the model language. Unlike previous models that read the text in the left or left way, the Bert uses transformer encoder to process both methods at the same time. Training using two tasks: Language prediction (prohibition correctly attached) and subsequent predictions (determining if a single sentence is reasonable following another). Architactaly, Bert comes in two sizes: Bert Base (12 layers, 110m parameters) and a large bear (24). Its structure relies solely on Encoder stacks and include special tokens like [CLS] to represent the full sentences and [SEP] to distinguish two sentences. You can affect tasks such as emotional analysis, to answer the question (as a group), and more. It was the first time he had his kind to understand the full meaning of sentences.
2. GPT
SPECIAL LINE (GPT 4): https://arxiv.org/pdf/2303.08774
The GPT family (trained transformer trained) presented by Openai. The series started with GPT-1 In 2018 and appeared in GPT-4 in the year 2023, with the latest version, GPT-4O, issued in May 2024, prescribed documents and illustrations. Previously trained in the largest Token Prediction language modern language modern language model: Each step model predicts the next word in order given to all previous words. After this pre-train training phase, the same model may be well organized in certain activities or used in the zero- a few of the additional parameters. The only decoder design means that the free GPT is only the previous tokens unlike the Bert's Bitirection Encoder. Noteworthy was an inconvenient and power of GPT: As per consecutive generation (GPT-2, GPT-3) grew into a thousand, the model shouted by the most blossomed study skills, inventing “Pre-train train and faster / tune” paradigm largest language models. However, they are about, with more access to API with API, and their exact architecture, especially the latest versions, are not completely expressed.
3. LLAMA
LLAMA 4 Blog Link: https://aai.meta.com/blog/llama-4-umtimodima-ntelligence/
Paper Link (Llama 3): https://arxiv.org/ABS/2407.21783
LLAMA, improved by Meta Ai and began to be released on February 2023, a family of decoder-percound decoder-end models only. Since from 70 billion parameters, with the latest version, LLAMA 4, released on April 2025. For example, the actual Llama models use a swinglu actence instead of gel, rotating movements (wires) instead of organized people, and rsnorm in the area of the background. The Bollonga family was released in many sizes from 75 parameters to 65b in Lolloma1, later in Lilloma3 to make large easily available models. Significantly, despite the modest parameter figures, these types are made in competitive people: Meta reported that the model of the open loop for Openai's open loop for Openai's 175b GPT-3 on multiple benchmarks, and its 65b model had a competitive palm and Deepmind's Chinchilla. Open llama (although prohibited by research) extended public use; Her importance is included effective training training for the open access to model.
4 palm
Palm 2 Technical Report: https://arxiv.org/ABS/2305.10403
Paper Link (Palm): https://arxiv.org/pdf/2204.02311
Palm (Patways File Model) series of large models of languages developed by Google's research. The original palm (announced 2022) was 540 billion parameters, decoder-only transformer and is part of the Google's system system. It was trained in a high-level Corpus of 780 billion and thousands of TPU V4 chips in Google infrastructure, using similarities to achieve high quality Hardware. The model is also leasened to multi-minimizing memory requirements during the acquisition. Palm is known as A few study skillsWorking well in new jobs for small examples due to its major information, including web information, books, Wikipedia, news, Gitab code. PALM 2, invited in May 2023, many languages and skills, thinking and skills, apps that use Google Bard and Workpace Ai.
5. Gemini
Gemini 2.5 Blog: https: // Blogy / Technology / Google-Deepe / Gemini-Model-Tinking-in-March-2025 /
SPECIAL LINE (Gemini 1.5): https://arxiv.org/ABS/2403.05530
SPECIAL LINE (GEINI): https://arxiv.org/ABS/2312.11805
Gemini is a Gemini family llm next generation (from Google Deepmind and Google Research), presented in late 2023. Gemini models are created with multimodal, meaning that they are from the soil to carry the text, images, and the code in one model. Like the pithper and GPT, Gemini is based on transformer, but its essential characteristics include large scales, very long support, and (Gemini 1.5) the construction of a mixture (MEE) to work properly. For example, Gemini 1.5 (“Pro”) uses operating technicians used (large small expert networks, only fewer performance input) to grow capacity therein. Gemini 2.5 Series, presented in March 2025, built on this basis for “deep” thinking skills. Juni 2025, Google issued Gemini 2.5 Flash and Pro Gemino as Flash-Lite, its highlight-flash-lime, prepared for high-quality operations and code. The German family comes with many (ultra, pro, nano) so it can work from cloud servers down on mobile devices. Multimodal Pretric Combination and Measures based on MoE makes Gemini to be Fluxible, the appropriate basis model.
6. Undeginemed
Mistration link (Mistrah 7b): https://arxiv.org/abs/2310.06825
The Mister Rebase is a French Ai and has issued its first llms in 2023. Its flagship model Architactal, the 7b story is like a GPT style model – includes well-being: Using the attention of convicted questions (GQA) to accelerate the fluency and video clues to deal tightly. In terms of performance, the Meta's 7B Outperformed Meta 2 13B and also provided strong comparisons contains 34b models, while very small. AI EMI-AI issued to the model under the Apache 2.0, making it free of use. Their great release was Mixtral 8 × 7b, MOE) Metter of Mix Nets EB-Parameter is a layer of each. The project helped a Mixtral game or hit GPT-3.5 and LLAMA 2 70B in activities such as math, codes, and multilingual benches. In May 2025, an illegal truth has been poorly discharged, the middle-set up of business models. This model submits more than 90% of the Pricier Model rate such as Claude 3.7 Sonnet on general benchmarks, while reducing per-toke per cost (approximately $ 0.00 in the sonnet). Sponsoring Multimodal Services (Text + Images), good consultation, and offer by api or on-PrEP transmission about a few items such as four gupo. However, unlike previous models, mediums are closed source, which moves social criticism that disbelief is legal from an open source elevator. Shortly afterward, June 2025, the truth presented in the pursuits, their first dedicated model in clear thinking. The smaller type is open under the Apache 2.0, and the amount of money is only business. The Magistrated Medium received 73.6% in AIED2024, with a small version of 70.7%, showing strong mathematical skills and logic in many languages.
7. Deepseek
Deepseek-R1) link: https://arxiv.org/abs/2501.12948
Deepseek is Chinese Ai (Spin-Off of High-Flyer Ai, was established 2023) Improving large llms. Its latest varieties (such as Deentieseek v3 and Deepseek-R1) use a mixture of mixture of transformer combination. In DeentieEek V3 / R1, each transformer layer has hundreds of sub-networks, but only a few workers work in each Token. This means using all parts of the model at the same time, the model has hundreds of professional networks and works at least a few (2 of 25) depending on what is needed for each installation. This allows Deekiesek to have a large model size (670 billion parameters) while using 37 million cash in each response, making it early and cheaper than a distant model of the same size. Like modern lms, it uses the functionality of the swiglu, a rotation, and the best of the best FP8. This aggressive mee design allows the depths gain the highest strength (compared to larger models) in the bottom of the computer. Deepseeek models (released under open licenses) attracted to the lead models such as GPT-4 in a multilingual generation and consultation, all overly reducing the training needs and decorations.
Kanal Mehreen Kanwal is a machine learning device and a technical writer who has a great deal of data science and a combination of Ai and a drug. Authorized EBOOK “that added a product with chatGPT”. As a Google scene 2022 in the Apac, it is a sign of diversity and the beauty of education. He was recognized as a Teradata variation in a Tech scholar, Mitacs Globalk scholar research, and the Harvard of Code Scholar. Kanalal is a zealous attorney for a change, who removes Femcodes to equip women to women.


