Comprehensive and Detailed Explanation From Exact Extract:
In a RAG (Retrieval Augmented Generation) architecture, there are steps that can be optimized using offline batch processing, particularly for operations that do not require real-time updates:
A. Generation of content embeddings:When new content is published, it can be processed in batches to generate embeddings (vector representations) offline. These embeddings are then used at query time for similarity search. As new documents come in daily, batch processing is ideal for generating embeddings for all new content together.
“Content/document embeddings are typically generated offline, as this operation can be computationally expensive and does not need to happen in real-time.”
(Reference: AWS GenAI RAG Blog, Amazon Bedrock RAG Pattern)
C. Creation of the search index:After generating the content embeddings, these are indexed in a vector database or search service. This indexing is also typically performed in batch as part of the offline pipeline.
“Building or updating the vector index is often performed as a batch operation, reflecting the latest state of the content repository.”
(Reference: AWS RAG Pattern Whitepaper)
B, D, and E are real-time steps. Embeddings for user queries (B), retrieval of relevant content (D), and response generation (E) must be processed in real-time to provide an interactive experience.
[References:, Retrieval Augmented Generation (RAG) on AWS, Amazon Bedrock RAG Documentation, , ]