What Is Artificial Intelligence and Generative AI?
This text was written by a human assisted by an genAI to standardize the tone and spelling of the content. All information has been validated by human experts in the field. The header image was generated by an genAI from the author's prompt.
In today's world, generative conversational agents are becoming increasingly prevalent. Microsoft Copilot, Google Gemini (formerly Bard), Claude Sonnet, Grok, ChatGPT, Llama, KimiK2... There's something for everyone. Used properly, these tools can speed up many tasks, but there are certain risks associated with their use. To understand them properly, you need to know what they are and how they work.
In a series of four articles, I'd like to take you on a voyage of discovery into the world of agents in generative AI.
A little vocabulary
Artificial intelligence
Artificial intelligence has become a generic term with a very broad back, but it's important to note that there are different categories.
Analytical Artificial Intelligence (AI)
Sometimes also referred to as classical Artificial Intelligence. It includes the techniques of fuzzy logic, neural networks, genetic algorithms, discrete mathematics, and statistical models to analyze an input and give a precise statistical output. What distinguishes AI from traditional programming is that the weighting has been calculated by automated training rather than manually by a human.
We encounter these AIs every day without realizing it. A thermostat, a scanner with text detection (OCR), a GPS, a voice generator, some switches, some automatons, some vacuum cleaners, drone stabilizers, an autonomous guidance system, and many more. Contrary to popular belief, we've been around AIs since the late 1950s. What has changed is the computing power and availability of these technologies.
Generative Artificial Intelligence (genAI)
This is the wave of tools that emerged in the 2020s and are in full swing at the time of writing. IAgen are predictive models for the production of various media (text, image, music, sound, video, file, etc.). They involve a series of complex mathematical matrix and vector processes, known as transformers, to calculate the token most likely to continue the current generation. The resulting output is then fed back into the «machine» a number of times until the processing is complete and the «desired» result has been obtained.
AGI (Artificial General Intelligence)
A dream for some, a fear for others. For the moment, the IAG is purely theoretical. It's a model that possesses all the knowledge and can perform any task 100% autonomously without any human intervention. This would be a point of singularity where the machine would achieve true «consciousness» and self-improvement. Technological limits prevent this point from being reached today, but many researchers in the field remain convinced that it will happen in the indeterminate future (let's just say that some are more optimistic than others).
Artificial hallucination
These are mistakes made by an AI or genAI. It's important to remember that genAI are statistical models and have no real knowledge of the subject matter. It therefore regularly happens, depending on the training data and/or randomness of the generation, that incongruous artifacts are included in the output. These «errors» are mainly due to training bias, confirmation bias (the tendency to want to confirm the user), but also to the randomness of the statistical model used to make each answer «unique».
Internal operation
Token
This is the language unit of generative AI. A token corresponds to a number representing a part of a word (suffix, prefix, root, etc.) or a symbol (such as punctuation or special characters like braces, arrows or emojis) used to reconstruct words and sentences.
Parameters
It's the internal structure that defines how a model «thinks». The higher this number, the higher the capabilities, but the higher the costs too, as they generally require far more resources. There are strategies for reducing the cost of very large models, but the bottom line is that these parameters are at the heart of the artificial brain's reasoning.
Temperature
Temperature is a parameter that controls the AI's level of creativity (or caution). The lower the temperature, the more predictable and factual the AI's responses. The higher the temperature, the more freedom the AI takes in its choice of words and ideas. For «mainstream» genAI, this parameter is generally hidden from the user, but it is part of the randomness, playing on the distributivity of the tokens, making each answer «unique» in output.
Seed
The «seed» is the other random factor parameter in generation. Although more visible in image generation, it is involved in all generators. For text generators, it will interact with context, length, conversation history and so on. However, it should be noted that there is an accumulation of this random factor as the output makes several loops of passage through the transformers. So although the same seed with the same prompt and temperature value should, in theory, give the same output, this is not quite the case. In practice, many factors make the faithful reproduction of two outputs very difficult. The outputs will be very similar, but a keen eye will be able to identify the differences.
Prompt
This is the instruction the AI is given to work with. The quality of the output is largely determined by the quality of the input query structure.
Context window
Think of it as the agent's short-term memory. It is essentially what the AI is able to perceive in a «conversation». It has a specific limit. Once this limit is reached, the oldest information is lost to the AI. It's at this point that the lack of coherence with the initial context we had set becomes apparent. There are mechanisms in place to try and reduce this «memory loss», such as compression, summaries and reinjections, but it's important to understand that AI will always have its limits.
Artificial neural networks (ANN)
This is a technique in artificial intelligence that enables a machine to make a statistical decision based on its training data. Traditionally, training an ANN involves providing desired output data for known inputs. The system weights each neuron to arrive at this result. By repeating this operation on millions or even billions of data sets, the model is fine-tuned and becomes semi-autonomous. This technique has been in use for several decades (on a smaller scale), notably in text recognition (OCR) and image recognition.
Transformer
Transformers are the heart of output prediction. They pass current knowledge of the prompt, images or whatever, as well as the partial output in progress, into a set of automated mathematical transformations and neural networks specialized in understanding its «latent vector space».

These processing modules are not, strictly speaking, autonomous, but rather represent a sequence for arriving at the desired result. This AI technique speeds up processing and enables dynamic improvement, but it is first and foremost a gigantic statistical calculation.
Language models
For generic conversational agents, this is their nerve center. There are different types of model. Each is a huge statistical model indicating which tokens are most likely to follow each of the preceding tokens. The major differences are the number of chips in and out, and the training data used. Here are a few of the most common.
Large Language Model (LLM)
These models have been trained on huge banks of random text. Their strength lies mainly in the generation of fluent text, but they are not specialized in the concepts they mention. They are the models that incorporate the greatest number of parameters, but whose information is not specialized in order to broaden the lexical field.
Small Language Models (SLM)
It is now possible to run generative AI natively on a phone, but to do so, the model had to be scaled down. The model has been designed to reduce the critical mass of the LLM without losing the required core capability. Depending on the domain requiring reduction, this may involve removing some of the less common concepts, or reducing the accessible vocabulary. In the latter case, the «diversity» of answers will be reduced, and may be reminiscent of its ancestor's pre-constructed answers in its vocabulary limitation, but still performs very well for common instructions.
Specialized models
Some agents have been specialized for a particular field. This is the case of Gemini's business range (examples: MedLM in medicine, SecLM in cybersecurity, Code Assist, Imagen, Veo, Lyria, and many others). Their language model was therefore specialized by reducing this model and refining and reducing the model considerably. The training texts are based on works dealing with very specific subjects of a more precise nature in the target field. This allows greater precision on specific topics while maintaining performance. Although it is possible to obtain artificial hallucinations, this type of model is generally less prone to them (91.1 % accuracy on US medical license exams in the case of MedLM).
Mixture of Experts (MoE)
Other agents have specialized in several domains simultaneously. Their language model is therefore very heavy, but offers the possibility of cross-searching between domains. This is the case with DeepSeek, for example. To reduce energy consumption, different experts are activated according to the domain detected. The model is therefore not fully active at all times. However, it requires a lot of space to store parameters even when they are inactive. It is also possible for a misinterpretation of the user's intention to cause an inconsistent response.
Multimodal Large Language Models (MLLM)
Some architectures use the term Vision Language Models (VLM). Generally speaking, VLMs can interpret not only words, but also pixels and audio in the same conceptual space. One of the pioneers was DeepMind. Although some prefer to keep things light by retaining an LLM or SLM model, the presence of MLLMs in mainstream tools has become almost standard.
Other types of burst models
Latent Concept Models (LCM) for discovering hidden meanings in text or hidden patterns in data.
Language Action Models (LAM), used, among other things, for automata with repercussions in the physical world (word-to-action conversion)
Masked Language Models (MLM), used in some of Google's systems to predict what the user is not saying when prompted, in order to understand the user's real intention in their query.
Segment Anything Models (SAM), used by Meta, among others, to react to user interactions.
And what about the agents?
Agents are essentially a convenient package offering the user an initial configuration including templates, drives and parameters for different generations. They can also integrate orchestrators when the tool offers the ability to perform several distinct tasks. For example, Gemini can be used to perform calculations, images, videos, text, editing, programming or learning, among other functions.
