This project provides a complete pipeline for translating PDF documents into other languages using a local Large Language Model (LLM). It extracts structured text and images from a PDF, translates the text into the target language using Ollama, and regenerates a new translated PDF — all while preserving layout and formatting.
translator.mov
- Extracts text and images from PDFs using
marker - Translates text to any target language using a local LLM via
ollama - Reconstructs translated content into a well-formatted PDF using
markdown-pdf - Saves extracted and translated text as
.mdfiles for readability and further use - Lightweight, offline-friendly, and privacy-preserving
marker: PDF layout-aware text & image extractionollama: Run LLMs likegemma,mistral, etc. locallymarkdown-pdf: Convert translated Markdown to a clean, printable PDFPython: File handling, automation, and orchestration