Skip to content

Commit cb82975

Browse files
Merge pull request #44 from cogsol/codex/create-pr-for-issue-18-cpt0n8
Docs: use topic-aligned `data/<topic-path>/` ingest paths and add nested-topic examples (fixes #18 and #19)
2 parents fb2af75 + 3672445 commit cb82975

4 files changed

Lines changed: 39 additions & 13 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3737
### Documentation
3838
- Environment variable and authentication docs updated to use `COGSOL_API_KEY` and optional Azure AD B2C credentials.
3939
- Removed outdated "no external dependencies" statements from README.
40+
- Added nested-topic ingestion examples and corrected ingest file paths to use topic-aligned `data/<topic-path>/` locations in docs.
4041
- Clarified in README topic examples that `documentation` is only a sample topic name and not required.
4142
- Retrieval-tool examples now instantiate retrieval definitions (e.g., `ProductDocsRetrieval()`) to avoid runtime confusion from class references.
4243
- Setup guides now explicitly document creating and activating a local `.venv` before installing dependencies.

README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,10 @@ python manage.py makemigrations data
163163
python manage.py migrate data
164164

165165
# Ingest documents into a topic
166-
python manage.py ingest documentation ./docs/*.pdf
166+
python manage.py ingest documentation ./data/documentation/*.pdf
167+
168+
# Ingest documents into a nested topic
169+
python manage.py ingest documentation/tutorials ./data/documentation/tutorials/*.pdf
167170
```
168171

169172
---
@@ -297,6 +300,14 @@ python manage.py ingest <topic> <files...> [options]
297300
- `topic`: Topic path (e.g., `documentation` or `parent/child/topic`)
298301
- `files`: Files, directories, or glob patterns to ingest
299302

303+
Use slash-separated paths for nested topics. For example, if you created `tutorials` under
304+
`documentation` with `starttopic tutorials --path documentation`, ingest into it with
305+
`documentation/tutorials`.
306+
307+
For a topic-aligned workflow, place files under `data/<topic-path>/` and ingest from that
308+
folder (for example, `./data/documentation/*.pdf` or
309+
`./data/documentation/tutorials/*.pdf`).
310+
300311
**Options:**
301312
- `--doc-type`: Document type (defaults to `Text Document`)
302313
- `--ingestion-config`: Name of an ingestion config from `data/ingestion.py`
@@ -313,13 +324,16 @@ python manage.py ingest <topic> <files...> [options]
313324
**Examples:**
314325
```bash
315326
# Ingest PDF files
316-
python manage.py ingest documentation ./docs/*.pdf
327+
python manage.py ingest documentation ./data/documentation/*.pdf
328+
329+
# Ingest into a child topic
330+
python manage.py ingest documentation/tutorials ./data/documentation/tutorials/*.pdf
317331

318332
# Ingest with custom config
319-
python manage.py ingest documentation ./docs/ --ingestion-config HighQuality
333+
python manage.py ingest documentation ./data/documentation/ --ingestion-config HighQuality
320334

321335
# Dry run to preview
322-
python manage.py ingest documentation ./data/ --dry-run
336+
python manage.py ingest documentation ./data/documentation/ --dry-run
323337
```
324338

325339
### `topics`
@@ -613,7 +627,7 @@ from cogsol.content import BaseIngestionConfig, PDFParsingMode, ChunkingMode
613627
Use with the `ingest` command:
614628

615629
```bash
616-
python manage.py ingest documentation ./docs/ --ingestion-config high_quality
630+
python manage.py ingest documentation ./data/documentation/ --ingestion-config high_quality
617631
```
618632

619633
#### Reference Formatters

docs/commands.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -587,6 +587,11 @@ python manage.py ingest <topic> <files...> [options]
587587
| `topic` | Yes | - | Topic path (e.g., `docs` or `parent/child`) |
588588
| `files` | Yes | - | Files, directories, or glob patterns |
589589

590+
Use slash-separated paths for nested topics during ingestion (for example:
591+
`documentation/tutorials`). For a topic-aligned workflow, place files under
592+
`data/<topic-path>/` and ingest from that matching path (for example:
593+
`./data/documentation/*.pdf` and `./data/documentation/tutorials/*.pdf`).
594+
590595
#### Options
591596

592597
| Option | Default | Description |
@@ -628,27 +633,30 @@ class HighQualityConfig(BaseIngestionConfig):
628633
Then use with:
629634

630635
```bash
631-
python manage.py ingest documentation ./docs/ --ingestion-config high_quality
636+
python manage.py ingest documentation ./data/documentation/ --ingestion-config high_quality
632637
```
633638

634639
#### Example Usage
635640

636641
```bash
637642
# Ingest all PDFs in a directory
638-
python manage.py ingest documentation ./docs/*.pdf
643+
python manage.py ingest documentation ./data/documentation/*.pdf
644+
645+
# Ingest into a child topic using parent/child path
646+
python manage.py ingest documentation/tutorials ./data/documentation/tutorials/*.pdf
639647

640648
# Ingest an entire directory recursively
641-
python manage.py ingest documentation ./docs/
649+
python manage.py ingest documentation ./data/documentation/
642650

643651
# Use custom settings
644-
python manage.py ingest documentation ./reports/ \
652+
python manage.py ingest documentation ./data/documentation/reports/ \
645653
--doc-type "Text Document" \
646654
--pdf-mode ocr \
647655
--chunking ingestor \
648656
--max-size-block 2000
649657

650658
# Preview what would be ingested
651-
python manage.py ingest documentation ./docs/ --dry-run
659+
python manage.py ingest documentation ./data/documentation/ --dry-run
652660
```
653661

654662
#### Output Messages

docs/getting-started.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -623,14 +623,17 @@ python manage.py migrate data
623623

624624
### Step 8: Ingest Documents
625625

626-
Upload documents to your topic:
626+
Upload documents to your topic. In this guide, examples place files under `data/<topic-path>/` so the file location mirrors the topic path:
627627

628628
```bash
629629
# Ingest a directory of documents
630-
python manage.py ingest product_docs ./docs/
630+
python manage.py ingest product_docs ./data/product_docs/
631+
632+
# Ingest into a nested child topic (parent/child path)
633+
python manage.py ingest product_docs/tutorials ./data/product_docs/tutorials/*.pdf
631634

632635
# Preview first (dry run)
633-
python manage.py ingest product_docs ./docs/ --dry-run
636+
python manage.py ingest product_docs ./data/product_docs/ --dry-run
634637
```
635638

636639
### Step 9: List Topics

0 commit comments

Comments
 (0)