Voice Calendar Scheduler FSM

A 24/7 voice-driven apartment viewing scheduling assistant using Twilio/WebRTC, an 8-step FSM, and Google Calendar integration.

https://github.com/davidbmar/voice-calendar-scheduler-FSM  ·  public  ·  shipped

Voice Calendar Scheduler FSM screenshot

What it is

An automated voice agent that handles inbound calls or browser connections to schedule apartment viewings. It uses a Finite State Machine to guide users through preference gathering, listing search via RAG, availability checking, and final booking on Google Calendar, powered by Faster-Whisper for speech-to-text and Piper for text-to-speech.

Features

Quickstart

git clone --recursive https://github.com/davidbmar/voice-calendar-scheduler-FSM
cd voice-calendar-scheduler-FSM
./scripts/setup.sh
cp .env.example .env
$EDITOR .env
./scripts/start.sh

Architecture

flowchart TD
    Caller[Caller Phone or Browser]
    Twilio[Twilio PSTN]
    WebRTC[Browser WebRTC]
    MediaStream[TwilioMediaStreamChannel]
    Signaling[WebRTC Signaling WS]
    Session[SchedulingSession]
    STT[STT Faster Whisper]
    FSM[FSM Orchestrator]
    TTS[TTS Piper]
    RAG[RAG Service LanceDB]
    GCal[Google Calendar API]
    LLM[LLM Claude or Ollama]
    Caller -->|PSTN| Twilio
    Caller -->|WebRTC| WebRTC
    Twilio --> MediaStream
    WebRTC --> Signaling
    MediaStream --> Session
    Signaling --> Session
    Session --> STT
    Session --> FSM
    Session --> TTS
    FSM --> RAG
    FSM --> GCal
    FSM --> LLM
    LLM --> FSM

How it's built

Built with Python 3.11+ using FastAPI for the backend. It integrates Twilio Media Streams for PSTN and WebRTC for browser audio. The core logic relies on a git submodule engine for FSM orchestration, LLM abstraction (Claude/Ollama), and voice processing. Apartment data is indexed in LanceDB for RAG-based search, and bookings are managed via the Google Calendar API.

How it runs

sequenceDiagram
    participant Caller
    participant Channel as Voice Channel
    participant Session as Scheduling Session
    participant FSM as FSM Orchestrator
    participant Tool as External Tools
    Caller->>Channel: Speak Audio
    Channel->>Session: Stream PCM Audio
    Session->>FSM: Process Input
    FSM->>Tool: Query Listings or Calendar
    Tool-->>FSM: Return Data
    FSM->>Session: Generate Response Text
    Session->>Channel: Synthesize Speech
    Channel->>Caller: Play Audio

How to apply & reuse

Clone the repository recursively to include the engine submodule. Run the setup script to create a virtual environment and install dependencies. Configure API keys for LLM, Twilio, and Google Calendar in the .env file. Start the RAG service, backend, and optional editor using the provided start script.

At a glance

CapabilitiesVoice interaction handlingFinite State Machine orchestrationRetrieval Augmented GenerationCalendar managementWebRTC signalingTwilio media streaming
ComponentsEngine SubmoduleScheduling AppGateway ServerRAG ServiceVisual Editor
TechPythonFastAPITwilioWebRTCFaster-WhisperPiper TTSLanceDBGoogle Calendar APIDocker
Depends onPython 3.11+GitDockerNode.js
Integrates withTwilioGoogle CalendarAnthropic ClaudeOllamaLanceDB
PatternsFinite State MachineRetrieval Augmented GenerationWebSocket SignalingMicroservicesEvent Driven
Reuse tagsvoice-agentschedulingfsmragtwiliowebrtccalendar-integration

⚠ Needs attention