How Generative AI Works: Inside Language Models and LLMs

By Simon-Pierre Morin

This text was written by a human assisted by an genAI to standardize the tone and spelling of the content. All information has been validated by human experts in the field. The header image was generated by an genAI from the author's prompt.

We saw in the previous article that there are several genAI models.

It's important to understand that the general principle remains the same, whatever the model. Even if generating text (based on an LLM) is not the same operation as generating an image, sound or video, the basic principle remains the same. It's important to remember, however, that even though some tools can generate images, podcasts, mindmaps, mock exams, text, and even more, it's important to understand that behind them, several different tools are at work, grouped together in a way that's transparent to the user.

How big is a generative AI model?

Although the size and density of the neural networks used may vary, the order of magnitude remains similar: several hundred billion parameters transformed into vectors with dimensions of the order of tens of thousands of elements. These vectors then pass through more than a hundred layers of the order of tens of thousands of neurons each, for a total of the order of several billion artificial neurons. (These data are based on the open-source Llama3.1 405B and DeepSeek-V3 models).

In short, it's an insanely huge calculation on a human scale.

Nowadays, models are no longer calculated in terms of size and density, but rather in terms of model parameters. Thus, Llama3.1 405B means Llama version 3.1, 405 billion parameters, and Gemma4 2B means Gemma version 4, 2 billion parameters. For your information, at the time of writing, the smallest model (Tiny-LLM) is a 10M model (10 million parameters) and the largest is DeepSeek-V3 671B (671 billion parameters). However, there have been announcements of future MoE-type models whose estimated size would be in the order of 1.5T to 2T (i.e. around 1.5 to 2 trillion parameters). The exact number is a trade secret and has not been published publicly, but considering the size of DeepSeek, it seems very plausible.

So how does it work?

An AI must first interpret the user's request. The precise technique for achieving this varies from one AI to another, but the principle remains to take what has been entered by the user, transform it into digital tokens usable by the machine, and deduce from it to transform it into a vector that the machine places in its domain space to give it meaning. Typically, the tokenizer is a specialized component (sometimes an ANN, sometimes a BPE algorithm or other), which deconstructs the user's input, vectorizes it, performs mathematical transformations on the vector to relate the words to each other and builds itself a resulting vector that represents both the message, the intention and the meaning.

Source | Transformer Explainer: LLM Transformer Model Visually Explained

It then passes through a series of transformers that attempt, through matrix operations, to deconstruct the message based on different comprehension parameters (meaning, intention, domain, characterization, temporization, depth, tone, request, search, etc.).

Source | Transformer Explainer: LLM Transformer Model Visually Explained

Then, depending on its type, it will construct its output by passing the user's vector and its combined generation. In the case of an image, it will start from a set of random pixels generated by the seed, and modify these pixels by deducing the most likely arrangement to match a bank of images associated with the tokens. In the case of text generation, the genAI will try to find the next token most likely to correspond to a fluid text. It will then replay its own partial response in the same machine to deduce the next token and so on until its next token is the stop instruction, adding the new token at each iteration.

Some genAI can write tons of lines by default, while others limit themselves to more succinct answers. It is always possible to instruct an genAI to limit itself or to be more verbose to modify the default length of its responses.

Still curious? If you are interested in this subject and would like to learn more, I suggest you visit the following sources: