Skip to content

dipankar/datalab-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datalab CLI

Convert, extract, and process documents from the command line

Crates.io CI License Downloads

Installation | Quick Start | Usage | Documentation


A powerful command-line interface for the Datalab document processing API. Built in Rust for speed and reliability.

Features

  • 📄 Document Conversion — Convert PDFs, images, and documents to Markdown, HTML, JSON, or semantic chunks
  • 🔍 Structured Extraction — Extract data using JSON schemas with confidence scores
  • 📝 Form Filling — Fill PDF forms programmatically with smart field matching
  • ⚡ Smart Caching — Local file-based caching reduces API costs on repeated requests
  • 🤖 Agent-Friendly — JSON output to stdout, progress events to stderr, designed for piping
  • 📊 Progress Streaming — Real-time JSON progress events for monitoring long operations

Installation

From crates.io

cargo install datalab-cli

From source

git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo install --path .

Pre-built binaries

Download from GitHub Releases.

Quick Start

1. Get your API key from datalab.to/app/keys

2. Set the environment variable

export DATALAB_API_KEY="your-api-key"

3. Convert your first document

datalab convert document.pdf

That's it! The converted markdown is output as JSON to stdout.

Usage

Convert Documents

# Convert to markdown (default)
datalab convert document.pdf

# Convert to HTML
datalab convert document.pdf --output-format html

# High-quality mode for complex documents
datalab convert report.pdf --mode accurate

# Convert specific pages
datalab convert book.pdf --page-range "0-10"

# Save to file
datalab convert document.pdf --output result.json

Extract Structured Data

# Extract with inline schema
datalab extract invoice.pdf --schema '{
  "fields": [
    {"name": "total", "type": "number"},
    {"name": "date", "type": "string"}
  ]
}'

# Extract with schema file
datalab extract invoice.pdf --schema schema.json

# Include confidence scores
datalab extract invoice.pdf --schema schema.json --include-scores

Fill Forms

# Fill a form
datalab fill application.pdf \
  --fields '{"name": "John Doe", "email": "john@example.com"}' \
  --output filled.pdf

File Management

# Upload a file
datalab files upload document.pdf

# List files
datalab files list

# Download a file
datalab files download file_abc123 --output downloaded.pdf

Cache Management

# View cache stats
datalab cache stats

# Clear old entries
datalab cache clear --older-than 7

Output Format

All commands output JSON to stdout for easy piping:

# Pipe to jq
datalab convert document.pdf | jq '.content'

# Save to file
datalab convert document.pdf > result.json

Progress events stream to stderr as JSON:

{"type":"start","operation":"convert","file":"document.pdf"}
{"type":"poll","status":"processing","elapsed_secs":1.2}
{"type":"complete","elapsed_secs":3.4}

Use --quiet to suppress progress, --verbose to force it.

Environment Variables

Variable Required Description
DATALAB_API_KEY Yes Your API key
DATALAB_BASE_URL No Custom API endpoint (for on-prem)
NO_COLOR No Disable colored output

Caching

Results are cached locally in ~/.cache/datalab/ to reduce API costs:

# First run: calls API
datalab convert document.pdf

# Second run: instant from cache
datalab convert document.pdf

# Bypass cache
datalab convert document.pdf --skip-cache

Documentation

Full documentation is available in the documentation directory. To view locally:

cd documentation
pip install -r requirements.txt
mkdocs serve

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo build

# Run tests
cargo test

# Run lints
cargo clippy
cargo fmt --check

License

MIT License - see LICENSE for details.


Built with Rust | Powered by Datalab

About

Convert, extract, and process documents from the command line

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages