Translate EPUB files using Large Language Models (DeepSeek, OpenAI, and other OpenAI-compatible APIs)
Ebook Translator is a Python tool that translates EPUB files using Large Language Models (LLMs) such as DeepSeek, OpenAI, and other OpenAI-compatible APIs. The tool intelligently segments ebook content, translates it using asynchronous LLM calls, and reconstructs the translated EPUB while preserving structure and metadata.
- EPUB Translation: Translates entire EPUB files while maintaining structure
- LLM-Powered: Uses advanced language models (DeepSeek, OpenAI, etc.)
- Smart Segmentation: Intelligently chunks content with token limits and overlap
- Async Processing: Parallelizes translation calls for better performance
- Metadata Preservation: Keeps original title, authors, and structure
- HTML Structure: Preserves formatting, images, CSS, and layout
- Python 3.12 or higher
- Poetry (for dependency management)
- API key for DeepSeek or OpenAI
-
Clone the repository:
git clone https://github.com/NeOzay/ebook-translator.git cd ebook-translator -
Install dependencies:
poetry install
-
Configure API keys:
cp .env.example .env
Edit
.envand add your API key:API_KEY=sk-your-api-key-here
DeepSeek (Recommended):
- Create an account at DeepSeek Platform
- Navigate to API Keys
- Generate a new API key
OpenAI (Alternative):
- Create an account at OpenAI Platform
- Navigate to API Keys
- Generate a new API key
Create a Python file (e.g., translate.py):
from ebook_translator import Language
from ebook_translator import LLM, BilingualFormat, EpubTranslator
# Configure the LLM
llm = LLM(
model_name="deepseek-chat",
log_dir="logs",
url="https://api.deepseek.com",
max_tokens=1300,
)
# Translate the EPUB
translator = EpubTranslator(llm, epub_path="my_book.epub")
translator.translate(
target_language=Language.FRENCH,
output_epub="my_book_translated.epub",
max_concurrent=5,
bilingual_format=BilingualFormat.SEPARATE_TAG,
)Then run:
python translate.pyBilingualFormat.INLINE: Original and translation in the same paragraphBilingualFormat.SEPARATE_TAG: Original and translation in separate paragraphsBilingualFormat.DISABLE: Completely replaces the original
See start.py for a complete configuration example with all available parameters.
| Variable | Required | Default | Description |
|---|---|---|---|
API_KEY |
✅ Yes | - | DeepSeek API key for authentication |
Type Checking:
pyright src/ebook_translatorRun Tests:
pytest tests/The translation pipeline follows this flow:
- EPUB Loading - Reads EPUB, extracts metadata and spine order
- Segmentation - Chunks content into token-limited segments with overlap
- Translation - Parallelizes LLM translation calls
- Reconstruction - Replaces original text with translations in DOM
- EPUB Generation - Writes new EPUB with translated content
- Segmentator (segment.py) - Chunks content with token limits and overlap
- HtmlPage (htmlpage.py) - Parses and reconstructs HTML with translations
- AsyncLLMTranslator (llm.py) - Async wrapper for LLM API calls
- TranslationWorkerFuture (worker.py) - Parallelizes translation tasks
IMPORTANT:
⚠️ NEVER commit the.envfile to git (already in.gitignore)⚠️ NEVER share your API keys publicly⚠️ If a key is compromised, revoke it immediately on the platform
This project is licensed under the MIT License.
NeOzay - neozay.ozay@gmail.com
Contributions are welcome! Please feel free to submit a Pull Request.