Voice agent platform that generates deterministic FSM-based voice agents from plain English business descriptions.
https://github.com/davidbmar/riff · private · shipped
riff is a Python-based platform for building voice agents that combine Large Language Models for natural language understanding with declared Finite State Machines (YAML) for strict workflow enforcement. It allows users to describe a business in plain text, automatically generating a voice flow that handles tasks like scheduling, ordering, and intake while preventing LLM hallucinations through deterministic guardrails and slot validation.
cd ~/src/riff .venv/bin/python3 -m pytest tests/ -q .venv/bin/python3 -m riff.web_server
flowchart TD
Caller[Caller] --> STT[STT]
STT --> RunTurn[run_turn]
RunTurn --> Guardrails[Guardrails]
RunTurn --> LLMCall[LLM Call]
RunTurn --> StateManager[State Manager]
Guardrails --> RunTurn
LLMCall --> RunTurn
StateManager --> RunTurn
RunTurn --> TurnResult[TurnResult]
TurnResult --> SlotExtractor[Slot Extractor]
TurnResult --> EvalFramework[Eval Framework]
TurnResult --> TurnLogger[Turn Logger]
TurnResult --> TTS[TTS]
TTS --> Caller
The system is built in Python using a modular architecture where the LLM handles conversational nuance ('riffs') and a state manager enforces business logic ('keeps the beat'). Core components include a YAML-based flow loader, a pure-function state manager, deterministic guardrails, and adapters for STT/TTS and LLM providers (Gemini, Gemma, Claude). It uses an event bus for internal communication and JSONL for turn logging.
sequenceDiagram
participant User as Caller
participant STT as STT Service
participant Server as riff Web Server
participant FSM as State Manager
participant LLM as LLM Provider
participant TTS as TTS Service
User->>STT: Speaks utterance
STT->>Server: Sends text transcript
Server->>FSM: Get current state context
FSM->>Server: Return state definition
Server->>LLM: Send prompt with context
LLM->>Server: Return generated response
Server->>FSM: Validate transition and slots
FSM->>Server: Confirm valid transition
Server->>TTS: Synthesize speech
TTS->>User: Plays audio response
Use riff to rapidly prototype and deploy voice agents for service businesses (plumbing, dental, salons) or retail scenarios (coffee, pizza). It is suitable for developers who need reliable, hallucination-free voice interactions without manually coding complex state machines. The platform supports local development with open-source models or cloud deployment with proprietary APIs.