diff --git a/cli/commands/index.md b/cli/commands/index.md index 383347d0..515f3d55 100644 --- a/cli/commands/index.md +++ b/cli/commands/index.md @@ -90,6 +90,10 @@ kbc help local create row | [kbc dbt generate profile](/cli/commands/dbt/generate/profile/) | Generate profiles for use with dbt. | | [kbc dbt generate sources](/cli/commands/dbt/generate/sources/) | Generate sources for use with dbt. | | [kbc dbt generate env](/cli/commands/dbt/generate/env/) | Generate environment variables for use with dbt. | +| | | +| **[kbc llm](/cli/commands/llm/) (BETA)** | **Export project data to AI-optimized format.** | +| [kbc llm init](/cli/commands/llm/init/) | Initialize a new local directory for LLM export. | +| [kbc llm export](/cli/commands/llm/export/) | Export project data to AI-optimized twin format. | ## Aliases diff --git a/cli/commands/llm/export/index.md b/cli/commands/llm/export/index.md new file mode 100644 index 00000000..e341a841 --- /dev/null +++ b/cli/commands/llm/export/index.md @@ -0,0 +1,122 @@ +--- +title: LLM Export Command +permalink: /cli/commands/llm/export/ +--- + +* TOC +{:toc} + + + +**Export project data to AI-optimized twin format directory structure.** + +``` +kbc llm export [flags] +``` + +The command must be run in a directory initialized with [kbc llm init](/cli/commands/llm/init/). + +## Description + +The twin format is designed for AI assistants to understand and work with Keboola projects directly from Git repositories. The export includes: + +- **Bucket and table metadata** with schema information +- **Transformation configurations** with platform detection +- **Component configurations** organized by type +- **Job execution history** and statistics +- **Lineage graph** showing data flow dependencies +- **Optional data samples** (controlled by flags) + +The export creates output files containing JSON with inline documentation (`_comment`, `_purpose`, `_update_frequency` fields) to help AI assistants understand the data structure. + +### Security Features + +- **Public repository detection** - Automatically detects if the directory is a public Git repository +- **Sample export disabled by default** - Data samples must be explicitly enabled with `--with-samples` +- **Encrypted secrets** - Fields starting with `#` are encrypted in the output + +## Options + +`-H, --storage-api-host ` +: Keboola instance URL, e.g., "connection.keboola.com" + +`-t, --storage-api-token ` +: Storage API token from your project + +`-f, --force` +: Skip confirmation when directory contains existing files + +`--with-samples` +: Include table data samples in the export + +`--sample-limit ` +: Maximum number of rows per table sample (default: 100, max: 1000) + +`--max-samples ` +: Maximum number of tables to sample (default: 50, max: 100) + +[Global Options](/cli/commands/#global-options) + +## Output Structure + +The export creates the following directory structure: + +``` +. +├── buckets/ # Bucket and table metadata +│ └── index.json +├── transformations/ # Transformation configurations +├── components/ # Component configurations by type +├── jobs/ # Job execution history +│ ├── recent/ +│ └── by-component/ +├── indices/ # Query indices and lookups +│ └── queries/ +├── ai/ # AI assistant guides +├── samples/ # Table data samples (if --with-samples) +├── lineage.json # Data flow dependencies +└── metadata.json # Project metadata +``` + +## Examples + +### Basic Export + +``` +➜ kbc llm export + +[1/5] Getting default branch... +Using branch: Main (ID: 1234) +[2/5] Fetching project data from APIs... +Fetched: 5 buckets, 23 tables, 150 jobs +[3/5] Processing data (lineage, platforms, sources)... +Processed: 5 buckets, 23 tables, 8 transformations, 45 lineage edges +[4/5] Generating twin format output... +[5/5] Skipping samples (not requested) +Twin format exported to: /path/to/project +Export completed successfully. +``` + +### Export with Data Samples + +``` +➜ kbc llm export --with-samples --sample-limit 50 --max-samples 20 + +[1/5] Getting default branch... +Using branch: Main (ID: 1234) +[2/5] Fetching project data from APIs... +Fetched: 5 buckets, 23 tables, 150 jobs +[3/5] Processing data (lineage, platforms, sources)... +Processed: 5 buckets, 23 tables, 8 transformations, 45 lineage edges +[4/5] Generating twin format output... +[5/5] Fetching and generating table samples... +Twin format exported to: /path/to/project +Export completed successfully. +``` + +## Next Steps + +- [LLM Init](/cli/commands/llm/init/) +- [All Commands](/cli/commands/) diff --git a/cli/commands/llm/index.md b/cli/commands/llm/index.md new file mode 100644 index 00000000..6be085ee --- /dev/null +++ b/cli/commands/llm/index.md @@ -0,0 +1,39 @@ +--- +title: LLM Commands (BETA) +permalink: /cli/commands/llm/ +--- + +* TOC +{:toc} + + + +**Export project data to AI-optimized format for use with AI assistants and LLMs.** + +The `kbc llm` commands create a "twin format" representation of your Keboola project, +designed for AI assistants to understand and work with your data pipelines. + +``` +kbc llm [command] +``` + +## Workflow + +1. **Initialize** - Run `kbc llm init` to set up the local directory +2. **Export** - Run `kbc llm export` to generate AI-optimized project data + +## Available Commands + +|--- +| Command | Description +|-|-|- +| [kbc llm init](/cli/commands/llm/init/) | Initialize a new local directory for LLM export. | +| [kbc llm export](/cli/commands/llm/export/) | Export project data to AI-optimized twin format. | + +## Next Steps + +- [LLM Init](/cli/commands/llm/init/) +- [LLM Export](/cli/commands/llm/export/) +- [All Commands](/cli/commands/) diff --git a/cli/commands/llm/init/index.md b/cli/commands/llm/init/index.md new file mode 100644 index 00000000..fdc65541 --- /dev/null +++ b/cli/commands/llm/init/index.md @@ -0,0 +1,69 @@ +--- +title: LLM Init Command +permalink: /cli/commands/llm/init/ +--- + +* TOC +{:toc} + + + +**Initialize a new local directory for LLM export.** + +``` +kbc llm init [flags] +``` + +The command must be run in an empty directory. + +This command creates the local manifest and metadata directory (`.keboola/`) without pulling any data from Keboola Connection. +Use [kbc llm export](/cli/commands/llm/export/) after initialization to generate the AI-optimized project data. + +If the command is run without options, it will start an interactive dialog asking for: +- URL of the [stack](https://help.keboola.com/overview/#stacks), for example, `connection.keboola.com`. +- [Storage API token](https://help.keboola.com/management/project/tokens/) to your project. +- Allowed [branches](https://help.keboola.com/tutorial/branches/) to work with. + +## Options + +`-H, --storage-api-host ` +: Keboola instance URL, e.g., "connection.keboola.com" + +`-t, --storage-api-token ` +: Storage API token from your project + +`-b, --branches ` +: Comma-separated list of branch IDs or name globs (use "*" for all) + +`--allow-target-env` +: Allow usage of `KBC_PROJECT_ID` and `KBC_BRANCH_ID` environment variables for future operations + +[Global Options](/cli/commands/#global-options) + +## Examples + +``` +➜ kbc llm init + +Please enter the Keboola Storage API host, e.g., "connection.keboola.com". +? API host: connection.north-europe.azure.keboola.com + +Please enter the Keboola Storage API token. Its value will be hidden. +? API token: *************************************************** + +Please select which project's branches you want to use with this CLI. +? Allowed project's branches: only main branch + +Created metadata directory ".keboola". +Created manifest file ".keboola/manifest.json". +Created file ".env.local" - it contains the API token, keep it local and secret. +Created file ".env.dist" - an ".env.local" template. +Created file ".gitignore" - to keep ".env.local" local. +``` + +## Next Steps + +- [LLM Export](/cli/commands/llm/export/) +- [All Commands](/cli/commands/)