A full-stack web application that performs real-time audio transcription and visualizes the speaker's emotional sentiment through a beautiful, dynamic Perlin noise field.
Try it out here: https://voice-agent-ruddy.vercel.app/
- Real-time Audio Transcription: Uses Deepgram API for high-accuracy speech-to-text
- AI-Powered Sentiment Analysis: Leverages OpenAI GPT-4 or Anthropic Claude to extract sentiment, emotion, and keywords
- Dynamic Perlin Noise Visualization: Gorgeous generative art that responds to emotional data
- Color shifts based on sentiment type (joy, calm, anxiety, anger, etc.)
- Particle energy reflects emotional intensity
- Flow field dynamics respond to energy levels
- Smooth Animations: Keywords fade in gracefully, transcripts auto-scroll
- Modern UI: Clean, semi-transparent overlays with glassmorphism effects
This is a three-part system:
- Frontend (React): Captures audio, manages WebSocket connections, displays UI and visualization
- Backend (FastAPI): Proxy server that securely calls AI APIs for sentiment analysis
- External APIs:
- Deepgram for real-time transcription
- OpenAI/Claude for sentiment and keyword extraction
- Node.js (v16 or higher)
- Python 3.8+
- Deepgram API Key ($200 credits available)
- OpenAI API Key or Anthropic API Key
cd backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
# Create a .env file in the backend directory:
echo "OPENAI_API_KEY=your_key_here" > .env
# OR
echo "ANTHROPIC_API_KEY=your_key_here" > .env
# Run the server
python main.pyThe backend will be available at http://localhost:8000
cd frontend
# Install dependencies
npm install
# Set up environment variables
# Create a .env file in the voice-agent directory:
echo "REACT_APP_DEEPGRAM_API_KEY=your_key_here" > .env
echo "REACT_APP_API_URL=http://localhost:8000" >> .env
# Start the development server
npm startThe app will open at http://localhost:3000
- Click "Start Recording" to begin capturing audio
- Speak naturally - your words will appear in the transcript panel
- Watch the visualization respond to your emotional tone
- Keywords will fade in smoothly on the right panel
- Click "Stop Recording" when finished
The Perlin noise visualization maps sentiment data to visual parameters:
-
Color (Hue):
- Joyful/Happy: Yellow-orange (45°)
- Calm/Peaceful: Blue (200°)
- Anxious/Nervous: Purple (280°)
- Angry: Red (0°)
- Sad: Deep blue (220°)
- Surprised: Cyan (160°)
- Loving: Pink (330°)
-
Saturation: Increases with emotion intensity
-
Brightness: Higher for more intense emotions
-
Particle Speed: Scales with energy level (calm → energetic)
-
Flow Field Complexity: More turbulent for higher energy
voice-agent/
├── src/
│ ├── components/
│ │ ├── PerlinVisualization.js # P5.js Perlin noise field
│ │ ├── TranscriptDisplay.js # Live transcript panel
│ │ ├── KeywordsDisplay.js # Animated keywords
│ │ └── Controls.js # Start/Stop buttons
│ ├── hooks/
│ │ └── useDeepgram.js # Deepgram WebSocket management
│ ├── services/
│ │ └── sentimentService.js # Backend API calls
│ ├── App.js # Main application
│ └── index.js # Entry point
├── public/
└── package.json
backend/
├── main.py # FastAPI server
├── requirements.txt # Python dependencies
└── .env # API keys (create this)
By default, the app uses OpenAI GPT-4. To switch to Claude:
- Update
sentimentService.jsto call/process_text_claude - Set
ANTHROPIC_API_KEYin backend.env
Adjust parameters in PerlinVisualization.js:
particlesRef.current: Number of particles (default: 500)- Flow field resolution:
colsandrows(default: 20px grid) - Color transition speed:
0.05in the lerp calculation - Trail effect: Alpha value in background rect
Microphone not working:
- Ensure HTTPS or localhost (mic requires secure context)
- Check browser permissions for microphone access
Backend connection failed:
- Verify backend is running on port 8000
- Check CORS settings in
main.py - Ensure
.envvariables are set correctly
Deepgram errors:
- Verify API key is valid and has credits
- Check internet connection for WebSocket
No visualization:
- Open browser console for errors
- Ensure
react-p5andp5are installed
- Frontend: React, react-p5, p5.js, Deepgram SDK, Axios
- Backend: FastAPI, OpenAI/Anthropic SDK, Uvicorn
- APIs: Deepgram (transcription), OpenAI GPT-4 / Claude (sentiment)
- Styling: CSS3 with glassmorphism, animations, gradients
This project is built as a technical demonstration for Memory Machines.
Made with ❤️ for Memory Machines - Going Beyond LLMs