eXtensible Language Computation Runtime - a document conversion and splitting toolkit for the JVM.
XLCR converts between document formats (PDF, DOCX, XLSX, PPTX, HTML, ODS, and more) and splits documents into fragments (pages, sheets, slides, attachments). It ships as both a CLI tool and a library publishable to Maven Central.
| Module | Artifact | Description |
|---|---|---|
| core | xlcr-core |
Tika text extraction, document splitters (PDF/Excel/PowerPoint/Word/Email/Archive), XLSX-to-ODS conversion |
| core-aspose | xlcr-core-aspose |
Aspose-powered conversions: PDF/DOCX/XLSX/PPTX/HTML with HIGH priority (commercial license required) |
| core-libreoffice | xlcr-core-libreoffice |
LibreOffice-powered conversions: DOC/XLS/PPT/ODS to PDF as open-source fallback |
| core-spark | xlcr-core-spark |
Spark DataFrame integration for batch document processing |
| xlcr | xlcr |
Unified CLI with compile-time transform discovery and automatic backend fallback (Scala 3 only) |
| server | xlcr-server |
HTTP REST API for document conversion and splitting (Scala 3 only) |
Backend selection is automatic: Aspose (HIGH priority) > LibreOffice (DEFAULT) > Core. You can also select a backend explicitly with --backend aspose or --backend libreoffice.
- Java 17+ (tested with Java 17 and 21)
- Mill build tool (included via
./millwrapper script) - LibreOffice (optional, for
core-libreofficebackend) - Aspose license (optional, for
core-asposebackend without watermarks)
git clone https://github.com/TJC-LP/xlcr.git
cd xlcr
# Build and install to ~/bin (no sudo)
make install-user
# Or install to /usr/local/bin (requires sudo)
make install// build.mill (Mill)
def mvnDeps = Seq(
mvn"com.tjclp::xlcr-core:0.1.0",
mvn"com.tjclp::xlcr-core-aspose:0.1.0" // optional
)// build.sbt (sbt)
libraryDependencies ++= Seq(
"com.tjclp" %% "xlcr-core" % "0.1.0",
"com.tjclp" %% "xlcr-core-aspose" % "0.1.0" // optional
)Cross-published for Scala 3.3.4 and Scala 2.13.14.
# Convert Word to PDF
xlcr convert -i document.docx -o output.pdf
# Convert with a specific backend
xlcr convert -i document.docx -o output.pdf --backend libreoffice
# Convert HTML to PowerPoint
xlcr convert -i presentation.html -o output.pptx
# Convert PDF to HTML (recommended for best editability)
xlcr convert -i document.pdf -o output.html# Split PDF into individual pages
xlcr split -i document.pdf -d pages/
# Split Excel workbook into sheets
xlcr split -i workbook.xlsx -d sheets/
# Split PowerPoint into slides
xlcr split -i presentation.pptx -d slides/
# Extract email attachments
xlcr split -i message.eml -d attachments/# Show document metadata
xlcr info -i document.pdf
# List all supported conversions
xlcr --backend-info
# Version
xlcr --version# Strip template/branding for clean output
xlcr convert -i branded.pptx -o clean.html --strip-masters
# Two-stage PDF to PowerPoint (best editability, smallest files)
xlcr convert -i document.pdf -o intermediate.html
xlcr convert -i intermediate.html -o presentation.pptxThe server module exposes document conversion as a REST API:
# Start the server
./mill 'server[3.3.4].run'
# Or with custom port
XLCR_PORT=9000 ./mill 'server[3.3.4].run'| Method | Path | Description |
|---|---|---|
POST |
/convert?to=<mime> |
Convert document to target format |
POST |
/split |
Split document into fragments (ZIP output) |
POST |
/info |
Get document metadata |
GET |
/capabilities |
List all supported conversions |
GET |
/health |
Health check |
# Convert DOCX to PDF
curl -X POST "http://localhost:8080/convert?to=pdf" \
-H "Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
--data-binary @document.docx -o output.pdf
# Split XLSX into sheets
curl -X POST "http://localhost:8080/split" \
-H "Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" \
--data-binary @workbook.xlsx -o sheets.zip
# List capabilities
curl http://localhost:8080/capabilities./mill __.compile # Compile all modules
./mill __[3.3.4].test # Run all tests (Scala 3)
./mill __[2.13.14].test # Run all tests (Scala 2.13)
./mill core[3.3.4].test # Run tests for a specific module
./mill __.checkFormat # Check code formatting
./mill __.reformat # Fix formatting
./mill __.assembly # Build fat JARsFor the core-libreoffice module:
# macOS
brew install --cask libreoffice
# Ubuntu/Debian
sudo apt-get install libreoffice
# Custom path
export LIBREOFFICE_HOME=/path/to/libreofficeFor core-aspose tests without watermarks:
# Option 1: Copy license to resources
cp Aspose.Java.Total.lic core-aspose/resources/
# Option 2: Environment variable
export ASPOSE_TOTAL_LICENSE_B64=$(base64 < Aspose.Java.Total.lic)Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Note: The Aspose module requires valid Aspose licenses for production use. Evaluation/trial licenses can be obtained from Aspose directly.