[English Version] LLM Glossary

LLM Glossary in English Language

Agents

An Agent is a subset of an LLM (Large Language Model) within a larger system, possessing the same capabilities as any LLM (i.e., understanding human language). However, it is programmed to execute specific tasks as instructed. Agents process and execute tasks under these instructions. The separation of Agents allows for more focused and efficient performance tailored to specific objectives.

For example, an LLM Chatbot providing customer service might have primary services such as order processing and answering frequently asked questions (FAQ). We can simply divide the Agents into three types:

Manager Agent: Welcomes customers and delegates tasks to the relevant Agents. Order Agent: Responsible for processing orders. FAQ Agent: Responsible for answering frequently asked questions. For instance, when a customer first engages with the system, they will encounter the Manager Agent. If the customer wishes to place an order, the Manager Agent processes this request and delegates it to the Order Agent, which specializes in handling orders.

You can learn more about LLM Agents at this Link.

Few-shot learning

Few-shot learning is a method by which an LLM (Large Language Model) can be trained with only a few examples. In the context of LLMs, if we want to teach the model to translate simple regional dialects, we can start by teaching it that the word "ā¸šāšˆ" means "āš„ā¸Ąāšˆ" (no/not) and provide an example. When faced with a question requiring translation of the regional dialect, the LLM can then translate dialects containing the word "ā¸šāšˆ."

You can see additional examples at the link below.

Fine-tuning

Fine-tuning is the process of adjusting a pre-trained LLM (Large Language Model) or a model trained from scratch to better suit the specific tasks we intend to use it for. This is accomplished by training the model with additional or specialized data, adjusting certain parameters (such as temperature), or managing prompts appropriately.

For example, if you want to use an LLM in the medical field, you may need to fine-tune the LLM to ensure the quality of responses is suitable for medical use. This is because most base LLMs are not configured to respond in a medical context or contain extensive medical information.

You can learn more about using prompts with technical terms from the link below.

🔎Technical term (Retrieve)

Framework

A framework or data framework is a structural model that facilitates the connection of processes from start to finish, enabling the creation of LLM applications more efficiently and easily. You can use the functions of a framework to develop LLM applications without starting from scratch, which significantly reduces development time.

Just as JavaScript has various frameworks like Angular, Vue.js, or React, there are several frameworks available for developing LLM applications. Popular ones include LlamaIndex and Langchain.

You can study an example of using LlamaIndex as a framework for creating a chatbot from the link below.

Guardrail

Guardrail is a technique used to set boundaries for the responses of an LLM (Large Language Model). By defining rules or constraints, you can control the LLM's operations, preventing errors or deviations from its intended purpose.

For example, if you create an LLM Chatbot for hospital services, you can use the Guardrail technique to ensure that the Chatbot only responds to topics related to your hospital. Without these constraints, users might use the Chatbot for unrelated tasks, potentially causing damage to your business.

You can learn more about applying Guardrail to Chatbots from this Link.

Hallucination

Hallucination is an event where an LLM (Large Language Model) or NLP (Natural Language Processing) system generates information or text that does not align with reality or lacks supporting data, resulting in incorrect outputs.

This can occur due to various reasons, such as incomplete or noisy training data, ambiguous questions, or model bias from the learned data. Hallucinations can significantly impact applications requiring high accuracy, such as in medicine, law, or engineering.

Large Language Models (LLMs)

A large language model is a sizable AI model trained to understand and generate human language. It acts like an artificial brain that consumes vast amounts of linguistic data, learning to comprehend meanings, contexts, and various ways of using language. LLMs are developed based on deep learning algorithms and serve as the foundation for generative AI.

Examples of LLMs include GPT-4, Claude 3.5 Sonnet, and SeaLLM v3. Some providers offer playgrounds and APIs for developers to experiment with their models.

For those interested in trying out SeaLLM through LLM as a Service, you can use the App Float16.cloud via the link below.

Overfitting

Overfitting is a behavior of an AI model where it learns the details of a specific training dataset too well. Besides learning useful information, the model may also learn noise in the data. This can happen when the model is too complex. When predicting data it has seen before, the model can be very accurate, but when it encounters new, unseen data, it may fail to make correct predictions and perform poorly.

For example, suppose we create a model to predict whether an image contains kitchen utensils. If the training images all have a gas stove in the background, the model might learn to associate kitchen utensils with the presence of a gas stove. If it then encounters an image of kitchen utensils without a gas stove in the background, it might incorrectly predict that the image does not contain kitchen utensils.

In the context of LLMs, overfitting can occur when we create a model to help write stories. The model might produce responses very similar to the training text, lacking creativity. This is problematic for models involved in creative tasks or generative AI.

In addition to overfitting, there is also underfitting, which occurs when an AI model has too little training data, leading to incorrect answers and high bias.

Pre-trained

Pre-trained models are AI models that have already been trained on large datasets. These models are typically trained with diverse and comprehensive data to understand and handle various languages or information effectively.

Training large models from scratch requires significant resources and time. However, using pre-trained models can reduce the time and resources needed to train a new model. These pre-trained models can be used and further improved (fine-tuned) to suit specific tasks more easily.

Examples of pre-trained language models include Llama3, Gemma, and Mamba. These models can be found on platforms like Hugging Face

Prompt Engineering

A "prompt" is a text or question that we input into an LLM (Large Language Model) to instruct it to generate information and answers. However, the responses may not always align with our expectations. Therefore, we need to engage in a process called "Prompt Engineering" to ensure that the answers produced meet our requirements.

"Prompt Engineering" can occur during the creation of the prompt or while asking questions. This involves specifying conditions more clearly or configuring the LLM to have a particular character or response style as we desire.

You can learn about Basic Prompting from the link below.

📚Prompting

Quantization

Quantization is the process of reducing the precision of the parameters in an AI model to make the model smaller, save memory, and increase computational speed, without significantly compromising performance.

You can view the Benchmark LLM Speed Inference from Float16 at the link below.

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation is a technique used to enhance the performance of LLMs (Large Language Models) by providing them with specialized or updated information, enabling the models to answer more specific questions accurately.

Since pre-trained models are trained with data limited to specific time periods or broad information, we need to provide the models with the specific data we want them to know. However, direct training requires substantial resources. Therefore, the RAG process is a popular technique, which involves the following steps:

  1. Embed the additional data into vectors and store them in a vector database.

  2. Input the question into the LLM.

  3. The LLM converts the question into a vector and searches for relevant information in the vector database.

  4. The LLM processes the relevant information into human language and responds.

You can learn more about RAG from this Link.

Token

A token is the smallest unit of data that is segmented for processing. It can be a word, character, or other subunit used in data analysis and processing. Each model and language may use different methods to count tokens, which can be in the form of words, syllables, or characters.

Most LLMs use tokens to indicate processing speed, typically measured in tokens per second (Token/Second), representing the number of tokens that can be processed or generated per second. The higher the number, the faster the response time. Additionally, the pricing model for LLM as a Service is often based on the cost per 1 million tokens.

You can try out the Tokenizer Playground at the link below.

Vector Database

A vector database is a type of database designed to store and search data in the form of vectors. Generally, vectors are sets of numbers that represent features or various types of data, such as image data, text data, or audio data. Using vectors to represent data makes it easier to perform calculations or statistical analyses.

Vector databases are particularly important in the context of LLMs (Large Language Models), especially for similarity searches. These include tasks such as finding similar images, searching for text with similar meanings, or finding similar audio clips.

You can learn more about Vector Databases from the link below.

Last updated