Datalab CLI

Convert, extract, and process documents from the command line

Installation | Quick Start | Usage | Documentation

A powerful command-line interface for the Datalab document processing API. Built in Rust for speed and reliability.

Features

📄 Document Conversion — Convert PDFs, images, and documents to Markdown, HTML, JSON, or semantic chunks
🔍 Structured Extraction — Extract data using JSON schemas with confidence scores
📝 Form Filling — Fill PDF forms programmatically with smart field matching
⚡ Smart Caching — Local file-based caching reduces API costs on repeated requests
🤖 Agent-Friendly — JSON output to stdout, progress events to stderr, designed for piping
📊 Progress Streaming — Real-time JSON progress events for monitoring long operations

Installation

From crates.io

cargo install datalab-cli

From source

git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo install --path .

Pre-built binaries

Download from GitHub Releases.

Quick Start

1. Get your API key from datalab.to/app/keys

2. Set the environment variable

export DATALAB_API_KEY="your-api-key"

3. Convert your first document

datalab convert document.pdf

That's it! The converted markdown is output as JSON to stdout.

Usage

Convert Documents

# Convert to markdown (default)
datalab convert document.pdf

# Convert to HTML
datalab convert document.pdf --output-format html

# High-quality mode for complex documents
datalab convert report.pdf --mode accurate

# Convert specific pages
datalab convert book.pdf --page-range "0-10"

# Save to file
datalab convert document.pdf --output result.json

Extract Structured Data

# Extract with inline schema
datalab extract invoice.pdf --schema '{
  "fields": [
    {"name": "total", "type": "number"},
    {"name": "date", "type": "string"}
  ]
}'

# Extract with schema file
datalab extract invoice.pdf --schema schema.json

# Include confidence scores
datalab extract invoice.pdf --schema schema.json --include-scores

Fill Forms

# Fill a form
datalab fill application.pdf \
  --fields '{"name": "John Doe", "email": "john@example.com"}' \
  --output filled.pdf

File Management

# Upload a file
datalab files upload document.pdf

# List files
datalab files list

# Download a file
datalab files download file_abc123 --output downloaded.pdf

Cache Management

# View cache stats
datalab cache stats

# Clear old entries
datalab cache clear --older-than 7

Output Format

All commands output JSON to stdout for easy piping:

# Pipe to jq
datalab convert document.pdf | jq '.content'

# Save to file
datalab convert document.pdf > result.json

Progress events stream to stderr as JSON:

{"type":"start","operation":"convert","file":"document.pdf"}
{"type":"poll","status":"processing","elapsed_secs":1.2}
{"type":"complete","elapsed_secs":3.4}

Use --quiet to suppress progress, --verbose to force it.

Environment Variables

Variable	Required	Description
`DATALAB_API_KEY`	Yes	Your API key
`DATALAB_BASE_URL`	No	Custom API endpoint (for on-prem)
`NO_COLOR`	No	Disable colored output

Caching

Results are cached locally in ~/.cache/datalab/ to reduce API costs:

# First run: calls API
datalab convert document.pdf

# Second run: instant from cache
datalab convert document.pdf

# Bypass cache
datalab convert document.pdf --skip-cache

Documentation

Full documentation is available in the documentation directory. To view locally:

cd documentation
pip install -r requirements.txt
mkdocs serve

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo build

# Run tests
cargo test

# Run lints
cargo clippy
cargo fmt --check

License

MIT License - see LICENSE for details.

Built with Rust | Powered by Datalab

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
documentation		documentation
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datalab CLI

Features

Installation

From crates.io

From source

Pre-built binaries

Quick Start

Usage

Convert Documents

Extract Structured Data

Fill Forms

File Management

Cache Management

Output Format

Environment Variables

Caching

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Datalab CLI

Features

Installation

From crates.io

From source

Pre-built binaries

Quick Start

Usage

Convert Documents

Extract Structured Data

Fill Forms

File Management

Cache Management

Output Format

Environment Variables

Caching

Documentation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages