Contacts
Request a demo
Close

Contacts

10 Rue de Penthièvre
75008 Paris
FRANCE

33 – 39 rue du puits
Romain 8070 Bertrange
Luxembourg

clients@cogniwavedynamics.com

RAG & EMBEDDING

RAG & EMBEDDING

A Hybrid Architecture at the Core of Modern AI

What is RAG? Retrieval-Augmented Generation (RAG) is an architecture that combines two major components of natural language processing:

  1. Retrieval: an information retrieval step that fetches relevant documents or passages from a knowledge base.
  2. Generation: a response generation step driven by a Large Language Model (LLM), leveraging the retrieved documents to produce contextualized answers.

This approach overcomes the limitations of fine-tuning by dynamically integrating external knowledge during inference.


Embeddings are vector representations of textual units (sentences, paragraphs, documents) in a dense, fixed-dimensional space (typically ranging from 384 to 1536 dimensions, depending on the model).

They are generated by specialized models, often distinct from LLMs (e.g., sentence transformers, OpenAI text-embedding-3-small, Instructor, GTE, etc.).

How are embeddings used in RAG?

Here are the typical processing steps in a RAG pipeline:

  1. Semantic Indexing (offline) • Reference documents are segmented into chunks (e.g., 200–500 tokens). • Each chunk is encoded into a vector using an embedding encoder. • Vectors are stored in a vector database (Vector DB) such as FAISS, Weaviate, Qdrant, Pinecone, etc.
  2. Semantic Search (online) • When a user query is submitted, it is encoded into a vector. • Vector similarity (cosine, dot-product, etc.) is computed between the query and indexed documents. • The top-k most relevant passages are retrieved.
  3. Augmented Generation • Retrieved documents are injected into the LLM prompt, typically via prompt stuffing (in-context learning) or chain-of-thought techniques. • The LLM generates a response based on retrieved information, resulting in answers that are: o Contextualized o Non-hallucinated o Domain or organization-specific

ComponentPossible Technologies
Embedding ModelOpenAI, HuggingFace (e.g., sentence-transformers), Cohere
Vector StoreFAISS, Qdrant, etc.
LLMLLaMA 3/4, Mistral, GPT-4, Claude, etc.
OrchestrationLangChain, etc.
Cloud StackAzure AI Search, AWS Kendra + Bedrock, GCP Vertex AI