An AI-powered voice bot that simulates Phone Guy from Five Nights at Freddy's, capable of handling real phone calls via SIP protocol with natural voice conversation.
- Real-time Voice Conversations - Handle incoming and outgoing SIP phone calls
- AI-Powered Responses - Uses NVIDIA NIM API (GLM 4.7) for intelligent, in-character responses
- Voice Cloning - RVC (Retrieval-based Voice Conversion) transforms TTS output into Phone Guy's voice
- Speech Recognition - faster-whisper for accurate speech-to-text
- Text-to-Speech - Chatterbox TTS with multilingual support
- Conversation Logging - All calls are automatically logged to
logs/directory - Pre-generation - Greeting is generated while phone is ringing for faster response
- GPU Acceleration - CUDA support for faster inference
- Asynchronous Architecture - Built on asyncio for efficient concurrent operations
- GPU: NVIDIA GPU with CUDA support and 12+ GB of VRAM (recommended for RVC and TTS)
- RAM: Minimum 8GB, 16GB+ recommended
- Storage: 5GB+ free space for models
- OS: Linux (Ubuntu 22.04+ recommended) or Windows
- Python: 3.11 (And only 3.11)
- SIP Server: PBX server (Asterisk, FreeSWITCH, etc.) or SIP provider
Tested on Ubuntu 24.04 + RTX 3090 and FreePBX (Asterisk 22)
git clone https://github.com/Sergey004/Phone_Guy.git
cd Phone_Guypython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activateOR (on conda is looks better IMHO)
conda create -n phoneguy python=3.11
conda activate phoneguypip install -r requirements.txtCopy the example environment file:
cp .env.example .envEdit .env with your settings:
# SIP Configuration
SIP_DOMAIN=pbx.example.com
SIP_PORT=5060
SIP_USER=phoneguy
SIP_PASSWORD=secret123
SIP_SERVER=pbx.example.com:5060
# NVIDIA LLM
NVIDIA_API_KEY=your_key_here
NVIDIA_MODEL=meta/llama3-70b-instruct
# Optional: Outbound call (comment out or leave empty for incoming only)
# TARGET_NUMBER=1001
# TTS Settings
TTS_ENGINE=turbo
TTS_DEVICE=cuda
# RVC Voice Model (optional)
RVC_ENABLED=true
AUDIO_PROMPT_PATH=ai_core/models/RVC/PhoneGuyFNAF1/PhoneGuy_FNAF1_01.wav
RVC_MODEL_PATH=ai_core/models/RVC/PhoneGuyFNAF1/PhoneGuyFNAF1_e1000_s22000.pth
RVC_INDEX_PATH=ai_core/models/RVC/PhoneGuyFNAF1/added_IVF339_Flat_nprobe_1_PhoneGuyFNAF1_v2.index
RVC_F0_METHOD=rmvpe
RVC_PITCH_SHIFT=0
RVC_INDEX_RATE=0.6All settings can be configured via .env. The main entry point is main_integration.py:
# SIP settings are read from .env
SIP_USER = os.getenv('SIP_USER', '555533')
SIP_PASS = os.getenv('SIP_PASSWORD', 'Test1234')
SIP_SERVER = os.getenv('SIP_SERVER', '192.168.1.176:5060').split(':')[0]
LOCAL_IP = "192.168.1.181"
# For incoming calls: leave TARGET_NUMBER unset in .env
# For outgoing calls: set TARGET_NUMBER=123456789 in .env
TARGET_NUMBER = os.getenv('TARGET_NUMBER')Phone_Guy/
βββ main_integration.py # Main entry point
βββ telephony/ # SIP/RTP telephony module
β βββ __init__.py
β βββ sip_rtp_client.py # SIP/RTP protocol handler
β βββ bridge.py # Audio bridge for RTP
β βββ audio_engine.py # Audio processing utilities
β βββ audio_codecs.py # Codec implementations
β βββ wav player.py # WAV file player
β
βββ ai_core/ # AI and voice processing module
β βββ __init__.py
β βββ ai_service.py # NVIDIA LLM integration
β βββ ai_config.py # AI prompts and configuration
β βββ stt_adapter.py # Speech-to-Text (Whisper)
β βββ tts_adapter.py # Text-to-Speech (Chatterbox)
β βββ document_processor.py # RAG document processing
β βββ convert_audio.py # Audio conversion utilities
β βββ rvc_py/ # RVC voice conversion module
β β βββ rvc_infer.py # RVC inference function
β β βββ rvc_model.py # RVC model class
β β βββ download_models.py # Model downloader
β β βββ lib/ # RVC internal libraries
β β
β βββ models/ # Voice models (RVC)
β βββ RVC/
β βββ PhoneGuyFNAF1/ # Example voice model
β βββ *.pth
β βββ *.index
β βββ *.wav
β
βββ knowledge_base/ # RAG documents
βββ chroma_db/ # Vector database (auto-created)
βββ user_memories/ # User memory storage (auto-created)
βββ logs/ # Call logs (auto-created)
βββ .env.example # Environment variables template
βββ .env # Your configuration
βββ requirements.txt # Python dependencies
βββ run.sh # Startup script
βββ README.md # This file
source .venv/bin/activate
OR
conda activate phoneguy
python main_integration.pyLeave TARGET_NUMBER unset (commented out or empty) in .env. The bot will:
- Register with the SIP server
- Wait for incoming calls
- Generate greeting while phone rings
- Answer and start conversation
- Log the entire call to
logs/call_YYYY-MM-DD_HH-MM-SS.txt
Set TARGET_NUMBER=123456789 in .env. The bot will:
- Register with the SIP server
- Initiate call to the specified number
- Start conversation when connected
- Log the call
Press Ctrl+C to gracefully stop the bot.
To enable voice cloning (Phone Guy's voice), you need RVC models.
Place RVC models in ai_core/models/RVC/:
ai_core/models/RVC/
βββ PhoneGuyFNAF1/
βββ PhoneGuyFNAF1_e1000_s22000.pth # Trained RVC model
βββ added_IVF339_Flat_nprobe_1_PhoneGuyFNAF1_v2.index # Faiss index
βββ PhoneGuy_FNAF1_01.wav # Reference audio
RVC_ENABLED=true
AUDIO_PROMPT_PATH=ai_core/models/RVC/PhoneGuyFNAF1/PhoneGuy_FNAF1_01.wav
RVC_MODEL_PATH=ai_core/models/RVC/PhoneGuyFNAF1/PhoneGuyFNAF1_e1000_s22000.pth
RVC_INDEX_PATH=ai_core/models/RVC/PhoneGuyFNAF1/added_IVF339_Flat_nprobe_1_PhoneGuyFNAF1_v2.index
RVC_F0_METHOD=rmvpe
RVC_PITCH_SHIFT=0
RVC_INDEX_RATE=0.6Some RVC features require additional models:
cd ai_core/rvc_py
python download_models.pyYou can find RVC models at:
- Hugging Face
- On Web
Problem: Bot fails to register with SIP server
Solutions:
- Check SIP credentials in
.env - Verify SIP server is reachable:
telnet SIP_SERVER 5060 - Check firewall rules for UDP port 5060
- Ensure
LOCAL_IPis correctly set
Problem: Poor speech recognition or TTS quality
Solutions:
- Use Whisper
mediumorlargemodel for better STT - Increase
target_sample_rateto 16000 or 24000 - Adjust
energy_thresholdin STT config - Use CUDA for better TTS performance
Problem: Voice conversion fails or doesn't work
Solutions:
- Verify RVC model paths are correct in
.env - Check if model file is corrupted
- Ensure all RVC dependencies are installed
- Try different
RVC_F0_METHOD:rmvpe,dio,harvest - Check CUDA availability:
python -c "import torch; print(torch.cuda.is_available())"
Problem: AI responses fail
Solutions:
- Verify
NVIDIA_API_KEYin.env - Check API key is valid and has credits
- Ensure internet connection
- Try different model:
meta/llama3-8b-instruct
Problem: Slow response times
Solutions:
- Use GPU acceleration (CUDA)
- Use smaller Whisper model (
tinyorbase) - Use
turboTTS engine - Reduce conversation history size
- Close other GPU-intensive applications
All conversations are automatically logged to the logs/ directory:
logs/
βββ call_2024-03-15_14-30-22.txt
Log format:
=== CALL STARTED AT 2024-03-15_14-30-22 ===
[14:30:25] Phone Guy: Uh, hello? Hello, hello? [clear throat] I wanted to record a message for you.
[14:30:32] User: Hello, who is this?
[14:30:35] Phone Guy: Oh, uh, this is Phone Guy. Did you just get hired?
=== CALL ENDED ===
Contributions are welcome! Feel free to:
- Report bugs
- Suggest new features
- Submit pull requests
- Improve documentation
This project is licensed under the MIT License - see the LICENSE file for details.
- NVIDIA NIM - AI inference
- Chatterbox TTS - Text-to-Speech
- faster-whisper - Speech Recognition
- RVC-Project - Voice Conversion
- Voice files - Only WAV file (covert from OGG and trimed), RVC files found somewhere on the Internet (the author said not to mention him)
For issues and questions:
- Open an issue on GitHub
- Check existing issues for solutions
- Review the troubleshooting section
And why did I do this? Why, tell me?
And who needs it anyway? I made garbage that no one needs. Yes, I'm whining because I spent so many hours getting this crap working, replacing three SIP libraries that I had to write my own. Yes, it's funny that the AI ββaudio goes straight to the RTP stream. Yes, it's cool that it says something and responds and even saves who you are and what you are, but this is simply a toy. Other projects of this format would be better than this.