Runs a local FastAPI server that acts as a real-time voice bridge.
Architecture
Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
|
+--> OpenAI (LLM)
+--> ElevenLabs (TTS)
Prerequisites
- Twilio Account: Phone number + TwiML App.
- Deepgram API Key: For fast speech-to-text.
- OpenAI API Key: For the conversation logic.
- ElevenLabs API Key: For realistic text-to-speech.
- Ngrok (or similar): To expose your local port 8080 to Twilio.
Setup
- Install Dependencies:
pip install -r scripts/requirements.txt
- Set Environment Variables (in
~/.moltbot/.env, ~/.clawdbot/.env, or export):
export DEEPGRAM_API_KEY="your_key"
export OPENAI_API_KEY="your_key"
export ELEVENLABS_API_KEY="your_key"
export TWILIO_ACCOUNT_SID="your_sid"
export TWILIO_AUTH_TOKEN="your_token"
export PORT=8080
- Start the Server:
python3 scripts/server.py
- Expose to Internet:
ngrok http 8080
- Configure Twilio:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to
Webhook.
- URL:
https://<your-ngrok-url>.ngrok.io/incoming
- Method:
POSTUsage
Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.
Customization
- System Prompt: Edit
SYSTEM_PROMPT in scripts/server.py to change the persona.
- Voice: Change
ELEVENLABS_VOICE_ID to use different voices.
- Model: Switch
gpt-4o-mini to gpt-4 for smarter (but slower) responses.