forked from pipecat-ai/pipecat
-
Notifications
You must be signed in to change notification settings - Fork 0
[VIDPA-576] bring changes to public fork #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
asilvestre
wants to merge
46
commits into
main
Choose a base branch
from
feature/VIDPA-576/bring_changes_to_public_fork
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
b15e72e
Modifying readme to point to original
vamis 4db3ff3
Add: Secret scanner
abhind-a 73ea461
first version of serializer, processor and example
varunps2003 fe1d7eb
fixed typos
varunps2003 eab3a2b
working but with the websocket transport
asilvestre 532fc15
removed the websocket server changes
varunps2003 0e0bee2
added new line
varunps2003 6368a86
[VIDMC-1262] - Using custom ws out transport
sampleref e0e0839
improved version of serializer and transport with test example
varunps2003 3240dcd
added the trasport service
varunps2003 a2d37bb
ruff formatting done
varunps2003 6604265
further improvemnts in code formatting
varunps2003 5786a2b
ruff linter errors solved
varunps2003 41ae3e4
fixed ruff linter issues
varunps2003 b964038
ruff linter issues resolved
varunps2003 6070d38
adding a new line
varunps2003 9672df3
adding function comments
varunps2003 9b932f6
fixed the sleep interval
varunps2003 8303ef7
improved the audio quality
varunps2003 84682d1
named changed of macro
varunps2003 b6b1095
renamed macro
varunps2003 73d9c99
small correction
varunps2003 6009d09
ruff fomatter fixed
varunps2003 57f0583
Merge pull request #17 from opentok/feat/VIDMC-1262/vonage-integration
varunps2003 fc140c1
feat(audio-connector): [VIDMP-1268] prepare files to have two differe…
asilvestre 31370b7
feat(native-sdk): [VIDMP-1268] Adding vonage transport for the native…
asilvestre 4bfaf31
[VIDMC-1293]: Vonage First Example using the pipecat-ai openai for st…
varunps2003 78160af
[VIDMC-1296] : second example speech to speech example added (#35)
varunps2003 c685d5e
[VIDMP-1268] fixes to native sdk wrapper transport (#36)
asilvestre 917feee
Update README.md (#44)
odivorra e4570f2
[VIDMP-1385] add events and adapt to new API (#42)
asilvestre f6d7212
[VIDMP-1383] Vonage video webrtc transport: adding tests and prepare …
asilvestre 1206f3a
Merge branch 'main' of github.com:opentok/vonage-pipecat into upstrea…
asilvestre 61bf9be
Merge branch 'main' into upstream_v0.0.85
asilvestre 03641b0
Merge pull request #54 from opentok/upstream_v0.0.85
asilvestre 4a4d793
[VIDPA-221] 2.18.0_rc into main (#56)
asilvestre 4edf39d
Merge v0.0.90: adopt websockets-base extra, bump webrtc pins; keep vo…
varunps2003 95eba9b
corrected the toml for sonic
varunps2003 0f8dc06
changed the aws sonic dependency version as required
varunps2003 08a2bbd
fixed further the aws sonic dependency
varunps2003 2353991
Merge pull request #61 from opentok/feature/VIDMC-1393/merge_latest_0…
varunps2003 23e94fa
[VIDMC-1394] added the changes to support the examples (#62)
varunps2003 a42d86a
[VIDPA-431] ensure sha in wheel name is correct (#66)
asilvestre e7bfa23
[VIDPA-431] put version in artifact bundle name (#67)
asilvestre 71f7447
fix rebase
asilvestre 5e32072
[VIDMC-1463] Add sample app for vonage sdk (#64)
rafayhameed File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| name: secret_scan | ||
| on: | ||
| pull_request: | ||
| branches: | ||
| - 'main' | ||
| push: | ||
| branches: | ||
| - 'main' | ||
|
|
||
| permissions: | ||
| contents: read | ||
| issues: write | ||
|
|
||
| jobs: | ||
| scan_secrets_on_pull_request: | ||
| if: github.event_name == 'pull_request' && github.event.pull_request.base.ref == github.event.repository.default_branch | ||
| uses: opentok/application-security-secret-scanner/.github/workflows/secret_scanner_on_pr.yml@main | ||
|
|
||
| scan_secrets_on_push: | ||
| if: github.event_name == 'push' && github.ref_name == github.event.repository.default_branch | ||
| uses: opentok/application-security-secret-scanner/.github/workflows/secret_scanner_on_push.yaml@main |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| 2.19.0 |
130 changes: 130 additions & 0 deletions
130
examples/foundational/40a-aws-nova-sonic-vonage-video-webrtc.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,130 @@ | ||
| # Copyright 2025 Vonage | ||
| """Example of using AWS Nova Sonic LLM service with Vonage Video WebRTC transport.""" | ||
|
|
||
| import asyncio | ||
| import json | ||
| import os | ||
| import sys | ||
|
|
||
| from loguru import logger | ||
|
|
||
| from pipecat.audio.vad.silero import SileroVADAnalyzer | ||
| from pipecat.frames.frames import LLMRunFrame | ||
| from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver | ||
| from pipecat.pipeline.pipeline import Pipeline | ||
| from pipecat.pipeline.runner import PipelineRunner | ||
| from pipecat.pipeline.task import PipelineTask | ||
| from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext | ||
| from pipecat.services import aws_nova_sonic | ||
| from pipecat.services.aws_nova_sonic.aws import AWSNovaSonicLLMService | ||
| from pipecat.transports.vonage.video_webrtc import ( | ||
| VonageVideoWebrtcTransport, | ||
| VonageVideoWebrtcTransportParams, | ||
| ) | ||
|
|
||
| logger.remove(0) | ||
| logger.add(sys.stderr, level="DEBUG") | ||
|
|
||
|
|
||
| async def main(session_str: str): | ||
| """Main entry point for the nova sonic vonage video webrtc example.""" | ||
| system_instruction = ( | ||
| "You are a friendly assistant. The user and you will engage in a spoken dialog exchanging " | ||
| "the transcripts of a natural real-time conversation. Keep your responses short, generally " | ||
| "two or three sentences for chatty scenarios. " | ||
| f"{AWSNovaSonicLLMService.AWAIT_TRIGGER_ASSISTANT_RESPONSE_INSTRUCTION}" | ||
| ) | ||
| chans = 1 | ||
| in_sr = 16000 | ||
| out_sr = 24000 | ||
|
|
||
| session_obj = json.loads(session_str) | ||
| application_id = session_obj.get("apiKey", "") | ||
| session_id = session_obj.get("sessionId", "") | ||
| token = session_obj.get("token", "") | ||
|
|
||
| transport = VonageVideoWebrtcTransport( | ||
| application_id, | ||
| session_id, | ||
| token, | ||
| VonageVideoWebrtcTransportParams( | ||
| audio_in_enabled=True, | ||
| audio_out_enabled=True, | ||
| vad_analyzer=SileroVADAnalyzer(), | ||
| publisher_name="TTS bot", | ||
| audio_in_sample_rate=in_sr, | ||
| audio_in_channels=chans, | ||
| audio_out_sample_rate=out_sr, | ||
| audio_out_channels=chans, | ||
| ), | ||
| ) | ||
|
|
||
| ns_params = aws_nova_sonic.aws.Params() | ||
| ns_params.input_sample_rate = in_sr | ||
| ns_params.output_sample_rate = out_sr | ||
| ns_params.input_channel_count = chans | ||
| ns_params.output_channel_count = chans | ||
|
|
||
| llm = AWSNovaSonicLLMService( | ||
| secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", ""), | ||
| access_key_id=os.getenv("AWS_ACCESS_KEY_ID", ""), | ||
| region=os.getenv("AWS_REGION", ""), | ||
| session_token=os.getenv("AWS_SESSION_TOKEN", ""), | ||
| voice_id="tiffany", | ||
| params=ns_params, | ||
| ) | ||
| context = OpenAILLMContext( | ||
| messages=[ | ||
| {"role": "system", "content": f"{system_instruction}"}, | ||
| { | ||
| "role": "user", | ||
| "content": "Tell me a fun fact!", | ||
| }, | ||
| ], | ||
| ) | ||
| context_aggregator = llm.create_context_aggregator(context) | ||
|
|
||
| pipeline = Pipeline( | ||
| [ | ||
| transport.input(), | ||
| context_aggregator.user(), | ||
| llm, | ||
| transport.output(), | ||
| ] | ||
| ) | ||
|
|
||
| task = PipelineTask(pipeline, observers=[TranscriptionLogObserver()]) | ||
|
|
||
| # Handle client connection event | ||
| @transport.event_handler("on_client_connected") | ||
| async def on_client_connected(transport, client): | ||
| logger.info(f"Client connected") | ||
| await task.queue_frames([LLMRunFrame()]) | ||
| # HACK: for now, we need this special way of triggering the first assistant response in AWS | ||
| # Nova Sonic. Note that this trigger requires a special corresponding bit of text in the | ||
| # system instruction. In the future, simply queueing the context frame should be sufficient. | ||
| await llm.trigger_assistant_response() | ||
|
|
||
| runner = PipelineRunner() | ||
|
|
||
| await asyncio.gather(runner.run(task)) | ||
|
|
||
|
|
||
| def cli_main(): | ||
| """Console script entry point for the nova sonic vonage video webrtc example.""" | ||
| if len(sys.argv) > 1: | ||
| session_str = sys.argv[1] | ||
| logger.info(f"Session str: {session_str}") | ||
| else: | ||
| logger.error(f"Usage: {sys.argv[0]} <VONAGE_SESSION_STR>") | ||
| logger.error("VONAGE_SESSION_STR should be a JSON string with the following format:") | ||
| logger.error( | ||
| '{"apiKey": "your_api_key", "sessionId": "your_session_id", "token": "your_token"}' | ||
| ) | ||
| sys.exit(1) | ||
|
|
||
| asyncio.run(main(session_str)) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| cli_main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Use an official Python runtime as a parent image | ||
| FROM python:3.12-bullseye | ||
|
|
||
| # Set the working directory in the container (repo root inside the image) | ||
| WORKDIR /vonage-chatbot | ||
|
|
||
| # Install ffmpeg for pydub at runtime | ||
| RUN apt-get update && \ | ||
| apt-get install -y --no-install-recommends ffmpeg && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Copy the example's requirements file into the container (for layer caching) | ||
| COPY examples/vonage-chatbot/requirements.txt ./requirements.txt | ||
|
|
||
| # Install any needed packages specified in requirements.txt | ||
| RUN pip install --upgrade pip && \ | ||
| pip install --no-cache-dir -r requirements.txt | ||
|
|
||
| # Copy the entire repo so local src/pipecat/* is available | ||
| COPY . . | ||
|
|
||
| # Install the local pipecat package (so imports like pipecat.serializers.vonage work) | ||
| RUN pip install -e ".[openai,websocket,vonage,silero,runner]" | ||
|
|
||
| # Expose the desired port (WebSocket server) | ||
| EXPOSE 8005 | ||
|
|
||
| # Run the application from the example directory | ||
| WORKDIR /vonage-chatbot/examples/vonage-chatbot | ||
| CMD ["python", "server.py"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| # Vonage Chatbot (Pipecat) | ||
|
|
||
| A real-time voice chatbot built using **Pipecat AI** with **Vonage Audio Connector** over **WebSocket**. | ||
| This project streams caller audio to **OpenAI STT**, processes the conversation using an LLM, converts the AI's response to speech via **OpenAI TTS**, and streams it back to the caller in real time. The server exposes a WebSocket endpoint (via **VonageAudioConnectorTransport**) that the Vonage **/connect API** connects to, bridging a live session into the **OpenAI STT → LLM → TTS** pipeline. | ||
|
|
||
|
|
||
| ## Table of Contents | ||
|
|
||
| - [Features](#features) | ||
| - [Requirements](#requirements) | ||
| - [Installation](#installation) | ||
| - [Expose Local Server with ngrok](#expose-local-server-with-ngrok) | ||
| - [Configure Vonage Voice)](#configure-vonage-voice) | ||
| - [Running the Application](#running-the-application) | ||
| - [Testing the Chatbot](#testing-the-chatbot) | ||
|
|
||
| ## Features | ||
|
|
||
| - **Real-time WebSocket audio** to/from Vonage over WebSocket | ||
| - **OpenAI-powered pipeline** STT → LLM → TTS pipeline | ||
| - **Silero VAD** for accurate talk-pause detection | ||
| - **Dockerized** for easy deployment | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Python **3.10+** | ||
| - A **Vonage account** | ||
| - An **OpenAI API key** | ||
| - **ngrok** (or any HTTPS tunnel) for local testing | ||
| - Docker (optional) | ||
|
|
||
| ## Installation | ||
|
|
||
| 1. **Clone the repo and enter it** | ||
|
|
||
| ```sh | ||
| git clone https://github.com/opentok/vonage-pipecat.git | ||
| cd vonage-pipecat/ | ||
| ``` | ||
|
|
||
| 2. **Set up a virtual environment** (recommended): | ||
|
|
||
| ```sh | ||
| python -m venv .venv | ||
| source .venv/bin/activate # Windows: .venv\Scripts\activate | ||
| ``` | ||
|
|
||
| 3. **Install Pipecat AI (editable mode)**: | ||
|
|
||
| ```sh | ||
| pip install -e ".[openai,websocket,vonage,silero,runner]" | ||
| ``` | ||
|
|
||
| 4. **Install example dependencies**: | ||
|
|
||
| ```sh | ||
| cd examples/vonage-chatbot | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| 5. **Create .env file**: | ||
|
|
||
| Copy the example environment file and update with your settings: | ||
|
|
||
| ```sh | ||
| cp env.example .env | ||
| ``` | ||
|
|
||
| 6. **Add your OpenAI Key to .env**: | ||
|
|
||
| ```sh | ||
| OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx | ||
| # Do not include quotes ("") | ||
| ``` | ||
|
|
||
| 7. **Install ngrok**: | ||
|
|
||
| Follow the instructions on the [ngrok website](https://ngrok.com/download) to download and install ngrok. You’ll use this to securely expose your local WebSocket server for testing. | ||
|
|
||
| ## Expose Local Server with ngrok | ||
|
|
||
| 1. **Start ngrok**: | ||
|
|
||
| In a new terminal, start ngrok to tunnel the local server: | ||
|
|
||
| ```sh | ||
| ngrok http 8005 | ||
| #Copy the wss URL, e.g. "uri": "wss://<your-ngrok-domain>", | ||
| ``` | ||
|
|
||
| You’ll see output like: | ||
|
|
||
| ```sh | ||
| Forwarding https://a5db22f57efa.ngrok-free.app -> http://localhost:8005 | ||
| ``` | ||
|
|
||
| The https:// address is your public ngrok domain. To create the WebSocket Secure (WSS) URL for Vonage, simply replace https:// with wss://. | ||
|
|
||
| Example: | ||
|
|
||
| ```sh | ||
| "websocket": { | ||
| "uri": "wss://a5db22f57efa.ngrok-free.app", | ||
| "audioRate": 16000, | ||
| "bidirectional": true | ||
| } | ||
| ``` | ||
|
|
||
| ## Configure Vonage Voice | ||
| 1. Open the **Vonage Video API Playground** (or your own application). | ||
| 2. Create a new session and publish the stream. | ||
| 3. Make a POST request to: | ||
| ```sh | ||
| /v2/project/{apiKey}/connect | ||
| ``` | ||
| 4. Include the following in the JSON body: | ||
| - sessionId | ||
| - token | ||
| - The WebSocket URI from ngrok (e.g. "wss://a5db22f57efa.ngrok-free.app") | ||
| - "audioRate": 16000 | ||
| - "bidirectional": true | ||
| 5. This connects your Vonage session to your locally running Pipecat WebSocket server through ngrok. | ||
| 6. For a working example of the /connect API request, see [Testing the Chatbot](#testing-the-chatbot) | ||
|
|
||
| ## Running the Application | ||
|
|
||
| Choose one of the following methods to start the chatbot server. | ||
|
|
||
| ### Option 1: Run with Python | ||
|
|
||
| **Run the Server application**: | ||
|
|
||
| ```sh | ||
| # Ensure you're in the example directory (examples/vonage-chatbot) and your virtual environment is active | ||
| python server.py | ||
| ``` | ||
|
|
||
| ### Option 2: Run with Docker | ||
|
|
||
| 1. **Build the Docker image**: | ||
|
|
||
| ```sh | ||
| docker build -f examples/vonage-chatbot/Dockerfile -t vonage-chatbot . | ||
| ``` | ||
|
|
||
| 2. **Run the Docker container**: | ||
| ```sh | ||
| docker run -it --rm -p 8005:8005 --env-file examples/vonage-chatbot/.env vonage-chatbot | ||
| ``` | ||
|
|
||
| The server will start on port 8005. Keep this running while you test with Vonage. | ||
|
|
||
| ## Testing the Chatbot | ||
|
|
||
| 1. Start publishing audio in the Vonage Playground | ||
| 2. Follow the examples/vonage-chatbot/client/README.md and run the connect_and_stream.py. | ||
| Once established then speak. Your audio will reach STT → LLM → TTS pipeline and you’ll hear AI-generated voice reply. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But these examples would not not go to the pipecat main repo as now there is a different repo for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we should bring all the changes as they are now in the private repo to this repo and then iterate. However, today in the sync meeting we were discussing maybe we'd keep the private repo.