🌉
3-Week Building LLMs Bootcamp
  • Welcome to the Bootcamp
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Kick Off Session at Tryst 2024
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
    • Group Session Recording
  • Word Vectors, Simplified
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus Section: Overview of the Transformers Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformers Architecture
      • Vision Transformers
    • Graded Quiz 1
    • Group Session Recording
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • For Starters: Best Practices to Follow
    • Navigating Token Limits
    • Hallucinations in LLMs
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task for the Module
    • Group Session Recording
  • RAG and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG Architecture with Key Components
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Key Benefits of using RAG in an Enterprise/Production Setup
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Bonus Video: Implementing End-to-End RAG | 1-Hour Session
    • Group Session Recording
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites
    • 1 – Dropbox Retrieval App
      • Understanding Docker
      • Building the Dockerized App
      • Retrofitting your Dropbox app
    • 2 – Amazon Discounts App
      • How the Project Works
      • Building the App
    • 3 – RAG with Open Source and Running "Examples"
    • 4 (Bonus) – Realtime RAG with LlamaIndex/Langchain and Pathway
      • Understanding the Basics
      • Implementation with LlamaIndex and Langchain
    • Building LLM Apps with Open AI Alternatives using LiteLLM
  • Bonus Resource: Recorded Interactions from the Archives
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Sample Projects and Additional Resources
    • Form for Submission
Powered by GitBook
On this page

Was this helpful?

  1. RAG and LLM Architecture

Basic RAG Architecture with Key Components

PreviousDiving Deeper: LLM Architecture ComponentsNextRAG versus Fine-Tuning and Prompt Engineering

Last updated 1 year ago

Was this helpful?

Now that we've explored the various components that make up the architecture of Large Language Models (LLMs), let's dive into how Retrieval-Augmented Generation (RAG) can work synergistically with these components of an LLM architecture. The aim is to show you how RAG can supercharge an LLM's capabilities by seamlessly integrating real-time or static data sources into the information retrieval and generation processes.

RAG/LLM Architecture

For a nuanced understanding of how Retrieval-Augmented Generation (RAG) optimizes Large Language Models, we'll delve into the essential elements and procedural steps that comprise the LLM architecture.

  1. Data Sources: Whether your starting point is cloud storage, Git repositories, or databases like PostgreSQL, the first task is to bring these varied data forms together through pre-configured connectors.

  2. Dynamic Vector Indexing: Text from these data sources is broken down into smaller segments (also called "chunks") and converted into vector representations. Models specialized for text embeddings, such as OpenAI's text-embedding-ada-002, are employed here. These vectors are continuously indexed to facilitate rapid search later on.

  3. Query Transformation: A user’s input query is likewise transformed into a compatible vector representation, ensuring that it can be effectively matched with the indexed vectors for data retrieval.

  4. Contextual Retrieval: Algorithms like Locality-Sensitive Hashing (LSH) are applied to find the closest matches between the user query and the indexed data vectors, staying within the model's token limitations.

  5. Text Generation: With the retrieved context, foundational LLMs like GPT-3.5 Turbo or Llama-2 employ techniques from the Transformer architecture, such as self-attention, to generate an appropriate response.

  6. User Interface: Finally, the generated text is presented to the user via interfaces like Streamlit or ChatGPT.

LLM Architecture Diagram to show how RAG works with Real-time or Static Data Source