Convert, extract, and process documents from the command line
Installation | Quick Start | Usage | Documentation
A powerful command-line interface for the Datalab document processing API. Built in Rust for speed and reliability.
- 📄 Document Conversion — Convert PDFs, images, and documents to Markdown, HTML, JSON, or semantic chunks
- 🔍 Structured Extraction — Extract data using JSON schemas with confidence scores
- 📝 Form Filling — Fill PDF forms programmatically with smart field matching
- ⚡ Smart Caching — Local file-based caching reduces API costs on repeated requests
- 🤖 Agent-Friendly — JSON output to stdout, progress events to stderr, designed for piping
- 📊 Progress Streaming — Real-time JSON progress events for monitoring long operations
cargo install datalab-cligit clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo install --path .Download from GitHub Releases.
1. Get your API key from datalab.to/app/keys
2. Set the environment variable
export DATALAB_API_KEY="your-api-key"3. Convert your first document
datalab convert document.pdfThat's it! The converted markdown is output as JSON to stdout.
# Convert to markdown (default)
datalab convert document.pdf
# Convert to HTML
datalab convert document.pdf --output-format html
# High-quality mode for complex documents
datalab convert report.pdf --mode accurate
# Convert specific pages
datalab convert book.pdf --page-range "0-10"
# Save to file
datalab convert document.pdf --output result.json# Extract with inline schema
datalab extract invoice.pdf --schema '{
"fields": [
{"name": "total", "type": "number"},
{"name": "date", "type": "string"}
]
}'
# Extract with schema file
datalab extract invoice.pdf --schema schema.json
# Include confidence scores
datalab extract invoice.pdf --schema schema.json --include-scores# Fill a form
datalab fill application.pdf \
--fields '{"name": "John Doe", "email": "john@example.com"}' \
--output filled.pdf# Upload a file
datalab files upload document.pdf
# List files
datalab files list
# Download a file
datalab files download file_abc123 --output downloaded.pdf# View cache stats
datalab cache stats
# Clear old entries
datalab cache clear --older-than 7All commands output JSON to stdout for easy piping:
# Pipe to jq
datalab convert document.pdf | jq '.content'
# Save to file
datalab convert document.pdf > result.jsonProgress events stream to stderr as JSON:
{"type":"start","operation":"convert","file":"document.pdf"}
{"type":"poll","status":"processing","elapsed_secs":1.2}
{"type":"complete","elapsed_secs":3.4}Use --quiet to suppress progress, --verbose to force it.
| Variable | Required | Description |
|---|---|---|
DATALAB_API_KEY |
Yes | Your API key |
DATALAB_BASE_URL |
No | Custom API endpoint (for on-prem) |
NO_COLOR |
No | Disable colored output |
Results are cached locally in ~/.cache/datalab/ to reduce API costs:
# First run: calls API
datalab convert document.pdf
# Second run: instant from cache
datalab convert document.pdf
# Bypass cache
datalab convert document.pdf --skip-cacheFull documentation is available in the documentation directory. To view locally:
cd documentation
pip install -r requirements.txt
mkdocs serveWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo build
# Run tests
cargo test
# Run lints
cargo clippy
cargo fmt --checkMIT License - see LICENSE for details.
Built with Rust | Powered by Datalab