🌉
3-Week Building LLMs Bootcamp
  • Welcome to the Bootcamp
    • Course Structure
    • Course Syllabus and Timelines
    • Know your Educators
    • Action Items and Prerequisites
    • Kick Off Session at Tryst 2024
  • Basics of LLMs
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of LLMs
    • Bonus Resource: Multimodal LLMs and Google Gemini
    • Group Session Recording
  • Word Vectors, Simplified
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
    • Bonus Section: Overview of the Transformers Architecture
      • Attention Mechanism
      • Multi-Head Attention and Transformers Architecture
      • Vision Transformers
    • Graded Quiz 1
    • Group Session Recording
  • Prompt Engineering and Token Limits
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • For Starters: Best Practices to Follow
    • Navigating Token Limits
    • Hallucinations in LLMs
    • Prompt Engineering Excercise (Ungraded)
      • Story for the Excercise: The eSports Enigma
      • Your Task for the Module
    • Group Session Recording
  • RAG and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG: Pre-trained and Fine-Tuned LLMs
    • In-context Learning
    • High-level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • Basic RAG Architecture with Key Components
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in RAG
    • Key Benefits of using RAG in an Enterprise/Production Setup
    • Hands-on Demo: Performing Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search (Bonus Module)
    • Bonus Video: Implementing End-to-End RAG | 1-Hour Session
    • Group Session Recording
    • Graded Quiz 2
  • Hands-on Development
    • Prerequisites
    • 1 – Dropbox Retrieval App
      • Understanding Docker
      • Building the Dockerized App
      • Retrofitting your Dropbox app
    • 2 – Amazon Discounts App
      • How the Project Works
      • Building the App
    • 3 – RAG with Open Source and Running "Examples"
    • 4 (Bonus) – Realtime RAG with LlamaIndex/Langchain and Pathway
      • Understanding the Basics
      • Implementation with LlamaIndex and Langchain
    • Building LLM Apps with Open AI Alternatives using LiteLLM
  • Bonus Resource: Recorded Interactions from the Archives
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Suggested Tracks for Ideation
    • Sample Projects and Additional Resources
    • Form for Submission
Powered by GitBook
On this page

Was this helpful?

  1. RAG and LLM Architecture

What is Retrieval Augmented Generation (RAG)?

PreviousRAG and LLM ArchitectureNextPrimer to RAG: Pre-trained and Fine-Tuned LLMs

Last updated 1 year ago

Was this helpful?

Large Language Models (LLMs) like GPT-4 or Mistral-7b are extraordinary in many ways, yet they come with challenges.

For now, let's focus on one specific limitation: the timeliness of their data. Since these models are trained up to a particular cut-off date, they aren't well-suited for real-time or organization-specific information.

Imagine you're a developer architecting an LLM-enabled app for Amazon. You're aiming to support shoppers as they comb through Amazon for the latest deals on sneakers. Naturally, you want to furnish them with the most current offers available. After all, nobody wants to rely on outdated information, and the same holds for data queried from your LLM.

This is where Retrieval-Augmented Generation, commonly known as RAG, significantly improves the capabilities of LLMs.

In a way that might resemble a resourceful friend in an exam setting or during a speech who—figuratively speaking, of course—swiftly passes you the most relevant "cue card" out of a ton of information to help you understand what you should be writing or saying next.

With RAG, efficient retrieval of the most relevant data for your use case ensures the text generated is both current and substantiated.

RAG, as its name indicates, operates through a three-fold process:

  • Retrieval: It sources pertinent information.

  • Augmentation: This information is then added to the model's initial input.

  • Generation: Finally, the LLM utilizes this augmented input to create an informed output.

Simply put, RAG empowers LLMs to include real-time, reliable data from external databases in their generated text.

For a better explanation, check out this video by Marina Danilevsky, Senior Research Staff Member at IBM Research. She shares two key challenges with LLMs resolved with the help of Retrieval Augmented Generation.

(Credits: IBM Technology)
Perhaps that wasn't the perfect example, but you get the point.
😄