11# Design Proposal - Embedding Ingestion Pipeline And RAG-Based Chat
22
3- ** TODOs **
3+ Not addressed in this document:
44
55* Vector store authentication options.
66* Document versioning and data update policies.
@@ -22,9 +22,10 @@ This document proposes enhancements to the `ilab` CLI to support workflows utili
2222(RAG) artifacts within ` InstructLab ` . The proposed changes introduce new commands and options for the embedding ingestion
2323and RAG-based chat pipelines:
2424
25- * A new ` ilab data ` sub-command to process customer documentation.
25+ * A new ` ilab rag ` command group, feature gated behind a ` ILAB_DEV_PREVIEW ` environment variable.
26+ * A new ` ilab rag ` sub-command group to process customer documentation.
2627 * Either from knowledge taxonomy or from actual user documents.
27- * A new ` ilab data ` sub-command to generate and ingest embeddings from pre-processed documents into a configured vector store.
28+ * A new ` ilab rag ` sub-command to generate and ingest embeddings from pre-processed documents into a configured vector store.
2829* An option to enhance the chat pipeline by using the stored embeddings to augment the context of conversations, improving relevance and accuracy.
2930
3031### 1.1 User Experience Overview
@@ -94,18 +95,18 @@ consistently to all new and updated commands.
9495
9596### 2.2 Document Processing Pipeline
9697
97- The proposal is to add a ` process ` sub-command to the ` data ` command group.
98+ The proposal is to add a ` process ` sub-command to the ` rag ` command group.
9899
99100For the Taxonomy path (no Model Training):
100101
101102``` bash
102- ilab data process --output /path/to/processed/folder
103+ ilab rag convert --output /path/to/processed/folder
103104```
104105
105106For the Plug-and-Play RAG path:
106107
107108``` bash
108- ilab data process --input /path/to/docs/folder --output /path/to/processed/folder
109+ ilab rag convert --input /path/to/docs/folder --output /path/to/processed/folder
109110```
110111
111112#### Processing-Command Purpose
@@ -134,11 +135,13 @@ The generated artifacts can later be used to generate and ingest the embeddings
134135
135136### 2.3 Document Processing Pipeline Options
136137
138+ ** Note** : The ` --help ` option will be aware of the ` rag ` command group only if ` ILAB_DEV_PREVIEW ` environment variable is set to ` true ` .
139+
137140``` bash
138- % ilab data process --help
139- Usage: ilab data process [OPTIONS]
141+ % ilab rag convert --help
142+ Usage: ilab rag convert [OPTIONS]
140143
141- The document processing pipeline
144+ The document processing pipeline for retrieval augmented generation
142145
143146Options:
144147 --input DIRECTORY The folder with user documents to process. In case
@@ -159,23 +162,23 @@ Options:
159162
160163# ## 2.4 Embedding Ingestion Pipeline
161164
162- The proposal is to add an ` ingest` sub-command to the ` data ` command group.
165+ The proposal is to add an ` ingest` sub-command to the ` rag ` command group.
163166
164167For the Model Training path:
165168
166169` ` ` bash
167- ilab data ingest
170+ ilab rag ingest
168171` ` `
169172
170173For the Taxonomy or Plug-and-Play RAG paths:
171174
172175` ` ` bash
173- ilab data ingest --input path/to/processed/folder
176+ ilab rag ingest --input path/to/processed/folder
174177` ` `
175178
176179# ### Ingestion-Working Assumption
177180
178- The documents at the specified path have already been processed using the ` data process ` command or an equivalent method
181+ The documents at the specified path have already been processed using the ` rag convert ` command or an equivalent method
179182(see [Getting Started with Knowledge Contributions][ilab-knowledge]).
180183
181184# ### Ingestion-Command Purpose
@@ -209,9 +212,11 @@ context for RAG-based chat pipelines.
209212
210213### 2.5 Embedding Ingestion Pipeline Options
211214
215+ ** Note** : The ` --help` option will be aware of the ` rag` command group only if ` ILAB_DEV_PREVIEW` environment variable is set to ` true` .
216+
212217` ` ` bash
213- % ilab data ingest --help
214- Usage: ilab data ingest [OPTIONS]
218+ % ilab rag ingest --help
219+ Usage: ilab rag ingest [OPTIONS]
215220
216221 The embedding ingestion pipeline
217222
@@ -411,7 +416,7 @@ ilab model chat --rag --retrieval-strategy query-expansion --retrieval-strategy-
411416Generate a containerized RAG artifact to expose a `/query` endpoint that can serve as an alternative source :
412417
413418```bash
414- ilab data ingest --build-image --image-name=docker.io/user/my_rag_artifacts:1.0
419+ ilab rag ingest --build-image --image-name=docker.io/user/my_rag_artifacts:1.0
415420```
416421
417422Then serve it and use it in a chat session:
0 commit comments