-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Dockerized the app, Wrapped in fastapi and uvicorn app server #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
namanthapliyal
wants to merge
9
commits into
myshell-ai:main
Choose a base branch
from
namanthapliyal:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+235
−3
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
51f591a
Fastapi app
2b1f76c
Fastapi app comments added
0a6bcd9
FastApi app production level
b583fb6
Updated readme for usage
ef6ab7f
Updated readme for usage
af9b9e6
Updated readme for usage
8c04fa3
updated requirements
b192956
updated requirements
8b3571b
optimized docker file
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 | ||
|
|
||
| # Install Python 3.10 and pip, as well as other dependencies | ||
| RUN apt-get update && \ | ||
| DEBIAN_FRONTEND=noninteractive apt-get install -y \ | ||
| python3.10 \ | ||
| python3.10-distutils \ | ||
| python3-pip \ | ||
| sudo \ | ||
| ffmpeg \ | ||
| git \ | ||
| aria2 \ | ||
| unzip && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Optional: ensure python3 points to python3.10 | ||
| RUN ln -sf /usr/bin/python3.10 /usr/bin/python3 | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Clone OpenVoice (or use COPY for local code) | ||
| RUN git clone https://github.com/namanthapliyal/OpenVoice.git openvoice | ||
|
|
||
| WORKDIR /app/openvoice | ||
|
|
||
| # Install Python dependencies | ||
| RUN python3 -m pip install --upgrade pip && \ | ||
| python3 -m pip install --no-cache-dir -r requirements.txt && \ | ||
| python3 -m pip install --no-cache-dir -e . | ||
|
|
||
| # Download and place checkpoints/resources | ||
| RUN aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/camenduru/OpenVoice/resolve/main/checkpoints_1226.zip -d /app/openvoice -o checkpoints_1226.zip && \ | ||
| unzip /app/openvoice/checkpoints_1226.zip && \ | ||
| rm checkpoints_1226.zip | ||
|
|
||
| EXPOSE 7860 | ||
|
|
||
|
|
||
| CMD ["uvicorn", "fastapi_app:app", "--host", "0.0.0.0", "--port", "7860"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| ## Local Development Setup | ||
|
|
||
| Follow these steps to set up and run the application locally for development and debugging. | ||
|
|
||
| ### 1. Clone the Repository | ||
|
|
||
| First, clone this repository to your local machine: | ||
|
|
||
| ```bash | ||
| git clone https://github.com/namanthapliyal/OpenVoice.git | ||
| cd ./OpenVoice | ||
| ``` | ||
|
|
||
| ### 2. Build the Docker Image | ||
|
|
||
| In the root location of the project, build the Docker image using the following command: | ||
|
|
||
| ```bash | ||
| docker build -t openvoice-fastapi . | ||
| ``` | ||
|
|
||
| This command will: | ||
|
|
||
| Pull the nvidia/cuda base image. | ||
| Install necessary system dependencies and Python packages. | ||
| Clone the OpenVoice library. | ||
| Download pre-trained checkpoints required for voice synthesis. | ||
| Set up the working directory and expose the application port. | ||
| Build the Docker image. | ||
|
|
||
| This process may take some time, especially during the initial download of the base image and checkpoints. | ||
|
|
||
| ### 3. Run the Docker Container | ||
|
|
||
| Once the image is built, you can run a container from it. To enable GPU acceleration and map the application's port to your host machine, use the following command: | ||
|
|
||
| ```bash | ||
| docker run --gpus all -p 7860:7860 openvoice-fastapi | ||
|
|
||
| ``` | ||
|
|
||
| - --gpus all: Exposes all available NVIDIA GPUs on your host to the container. Ensure the NVIDIA Container Toolkit is correctly installed. | ||
| - -p 7860:7860: Maps port 7860 inside the container (where FastAPI runs) to port 7860 on your host machine. | ||
|
|
||
| The FastAPI application will now be accessible at http://localhost:7860. | ||
|
|
||
| ### 4. Interact with the API | ||
|
|
||
| You can test the API using curl or any API client (like Postman, Insomnia, or your browser for GET requests). The primary endpoint is /synthesize/ which accepts POST requests with multipart/form-data. | ||
|
|
||
| Example curl Request: | ||
|
|
||
| ```bash | ||
| curl -X POST "http://localhost:7860/synthesize/" \ | ||
| -H "accept: application/json" \ | ||
| -H "Content-Type: multipart/form-data" \ | ||
| -F "prompt=This is a test sentence for voice synthesis." \ | ||
| -F "style=default" \ | ||
| -F "audio_file=@/path/to/your/reference_audio.mp3" \ | ||
| -F "agree=true" \ | ||
| --output synthesized_audio.wav | ||
| ``` | ||
|
|
||
| Parameters: | ||
|
|
||
| - prompt (string, required): The text to be synthesized. | ||
| - style (string, required): The speaking style. Supported values: default, whispering, shouting, excited, cheerful, terrified, angry, sad, friendly. (Note: Chinese only supports default). | ||
| - audio_file (file, required): An audio file (.mp3 or .wav) of the reference speaker whose voice you want to clone. | ||
| - agree (boolean, required): Must be true to accept the terms and conditions. | ||
|
|
||
| The API will return the synthesized audio as a .wav file. | ||
|
|
||
| Output Directory | ||
| Synthesized audio files and temporary processing files will be stored in the outputs/ directory within the container. For local debugging, you might want to mount a volume to persist these outputs on your host machine. | ||
|
|
||
| ### 4. Access Swagger Doc | ||
|
|
||
| You can access the Swagger UI documentation by navigating to http://localhost:7860/docs in your web browser. This provides an interactive API reference and allows you to test the API endpoints directly through the UI. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| from fastapi import FastAPI, File, UploadFile, Form, HTTPException | ||
| from fastapi.responses import FileResponse | ||
| import os | ||
| import torch | ||
| import langid | ||
| from openvoice import se_extractor | ||
| from openvoice.api import BaseSpeakerTTS, ToneColorConverter | ||
| import shutil | ||
|
|
||
| app = FastAPI() | ||
|
|
||
| # Configuration from openvoice_app.py | ||
| en_ckpt_base = 'checkpoints/base_speakers/EN' | ||
| zh_ckpt_base = 'checkpoints/base_speakers/ZH' | ||
| ckpt_converter = 'checkpoints/converter' | ||
| device = 'cuda' if torch.cuda.is_available() else 'cpu' | ||
| output_dir = 'outputs' | ||
| os.makedirs(output_dir, exist_ok=True) | ||
|
|
||
| # Load models | ||
| en_base_speaker_tts = BaseSpeakerTTS(f'{en_ckpt_base}/config.json', device=device) | ||
| en_base_speaker_tts.load_ckpt(f'{en_ckpt_base}/checkpoint.pth') | ||
| zh_base_speaker_tts = BaseSpeakerTTS(f'{zh_ckpt_base}/config.json', device=device) | ||
| zh_base_speaker_tts.load_ckpt(f'{zh_ckpt_base}/checkpoint.pth') | ||
| tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device) | ||
| tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth') | ||
|
|
||
| # Load speaker embeddings | ||
| en_source_default_se = torch.load(f'{en_ckpt_base}/en_default_se.pth').to(device) | ||
| en_source_style_se = torch.load(f'{en_ckpt_base}/en_style_se.pth').to(device) | ||
| zh_source_se = torch.load(f'{zh_ckpt_base}/zh_default_se.pth').to(device) | ||
|
|
||
| supported_languages = ['zh', 'en'] | ||
|
|
||
| @app.get("/") | ||
| async def root(): | ||
| return {"message": "Welcome to the OpenVoice API! Server is up and running!"} | ||
|
|
||
| @app.post("/synthesize/") | ||
| async def synthesize_speech( | ||
| prompt: str = Form(...), | ||
| style: str = Form(...), | ||
| audio_file: UploadFile = File(...), | ||
| ): | ||
|
|
||
| # Save the uploaded audio file temporarily | ||
| temp_audio_path = os.path.join(output_dir, audio_file.filename) | ||
| with open(temp_audio_path, "wb") as buffer: | ||
| shutil.copyfileobj(audio_file.file, buffer) | ||
|
|
||
| language_predicted = langid.classify(prompt)[0].strip() | ||
| print(f"Detected language: {language_predicted}") | ||
|
|
||
| if language_predicted not in supported_languages: | ||
| os.remove(temp_audio_path) | ||
| raise HTTPException(status_code=400, detail=f"The detected language {language_predicted} for your input text is not in our Supported Languages: {supported_languages}") | ||
|
|
||
| if language_predicted == "zh": | ||
| tts_model = zh_base_speaker_tts | ||
| source_se = zh_source_se | ||
| language = 'Chinese' | ||
| if style not in ['default']: | ||
| os.remove(temp_audio_path) | ||
| raise HTTPException(status_code=400, detail=f"The style {style} is not supported for Chinese, which should be in ['default']") | ||
| else: | ||
| tts_model = en_base_speaker_tts | ||
| if style == 'default': | ||
| source_se = en_source_default_se | ||
| else: | ||
| source_se = en_source_style_se | ||
| language = 'English' | ||
| if style not in ['default', 'whispering', 'shouting', 'excited', 'cheerful', 'terrified', 'angry', 'sad', 'friendly']: | ||
| os.remove(temp_audio_path) | ||
| raise HTTPException(status_code=400, detail=f"The style {style} is not supported for English, which should be in ['default', 'whispering', 'shouting', 'excited', 'cheerful', 'terrified', 'angry', 'sad', 'friendly']") | ||
|
|
||
| if len(prompt) < 2: | ||
| os.remove(temp_audio_path) | ||
| raise HTTPException(status_code=400, detail="Please give a longer prompt text") | ||
| if len(prompt) > 200: | ||
| os.remove(temp_audio_path) | ||
| raise HTTPException(status_code=400, detail="Text length limited to 200 characters for this demo, please try shorter text.") | ||
|
|
||
| try: | ||
| target_se, audio_name = se_extractor.get_se(temp_audio_path, tone_color_converter, target_dir='processed', vad=True) | ||
| except Exception as e: | ||
| os.remove(temp_audio_path) | ||
| raise HTTPException(status_code=500, detail=f"Get target tone color error: {str(e)}") | ||
|
|
||
| src_path = os.path.join(output_dir, 'tmp.wav') | ||
| tts_model.tts(prompt, src_path, speaker=style, language=language) | ||
|
|
||
| save_path = os.path.join(output_dir, 'output.wav') | ||
| encode_message = "@MyShell" | ||
| tone_color_converter.convert( | ||
| audio_src_path=src_path, | ||
| src_se=source_se, | ||
| tgt_se=target_se, | ||
| output_path=save_path, | ||
| message=encode_message | ||
| ) | ||
|
|
||
| # Clean up temporary files | ||
| os.remove(temp_audio_path) | ||
| os.remove(src_path) | ||
|
|
||
| return FileResponse(save_path, media_type="audio/wav", filename="synthesized_audio.wav") | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V2