Riff

Voice agent platform that generates deterministic FSM-based voice agents from plain English business descriptions.

https://github.com/davidbmar/riff  ·  private  ·  shipped

What it is

riff is a Python-based platform for building voice agents that combine Large Language Models for natural language understanding with declared Finite State Machines (YAML) for strict workflow enforcement. It allows users to describe a business in plain text, automatically generating a voice flow that handles tasks like scheduling, ordering, and intake while preventing LLM hallucinations through deterministic guardrails and slot validation.

Features

Quickstart

cd ~/src/riff
.venv/bin/python3 -m pytest tests/ -q
.venv/bin/python3 -m riff.web_server

Architecture

flowchart TD
    Caller[Caller] --> STT[STT]
    STT --> RunTurn[run_turn]
    RunTurn --> Guardrails[Guardrails]
    RunTurn --> LLMCall[LLM Call]
    RunTurn --> StateManager[State Manager]
    Guardrails --> RunTurn
    LLMCall --> RunTurn
    StateManager --> RunTurn
    RunTurn --> TurnResult[TurnResult]
    TurnResult --> SlotExtractor[Slot Extractor]
    TurnResult --> EvalFramework[Eval Framework]
    TurnResult --> TurnLogger[Turn Logger]
    TurnResult --> TTS[TTS]
    TTS --> Caller

How it's built

The system is built in Python using a modular architecture where the LLM handles conversational nuance ('riffs') and a state manager enforces business logic ('keeps the beat'). Core components include a YAML-based flow loader, a pure-function state manager, deterministic guardrails, and adapters for STT/TTS and LLM providers (Gemini, Gemma, Claude). It uses an event bus for internal communication and JSONL for turn logging.

How it runs

sequenceDiagram
    participant User as Caller
    participant STT as STT Service
    participant Server as riff Web Server
    participant FSM as State Manager
    participant LLM as LLM Provider
    participant TTS as TTS Service
    User->>STT: Speaks utterance
    STT->>Server: Sends text transcript
    Server->>FSM: Get current state context
    FSM->>Server: Return state definition
    Server->>LLM: Send prompt with context
    LLM->>Server: Return generated response
    Server->>FSM: Validate transition and slots
    FSM->>Server: Confirm valid transition
    Server->>TTS: Synthesize speech
    TTS->>User: Plays audio response

How to apply & reuse

Use riff to rapidly prototype and deploy voice agents for service businesses (plumbing, dental, salons) or retail scenarios (coffee, pizza). It is suitable for developers who need reliable, hallucination-free voice interactions without manually coding complex state machines. The platform supports local development with open-source models or cloud deployment with proprietary APIs.

At a glance

CapabilitiesPlain English to Voice Flow GenerationDeterministic State Machine EnforcementMulti-Provider LLM IntegrationReal-time Session InspectionA/B Testing HarnessSilent Failure Observability
ComponentsFlow LoaderState ManagerGuardrails EngineSession StoreWeb ServerEvent BusSlot ExtractorTurn Logger
TechPythonYAMLMermaidJSONLPytestFastAPIWebSockets
Depends onGoogle Gemini APIGemma Local ModelClaude APISTT ServiceTTS Service
Integrates withModel Context Protocol MCPBrowser Web UILocal File SystemGit Version Control
PatternsLLM Riffs FSM BeatsPure Function GuardrailsEvent Bus CancellationLRU Cache BoundingSilent Failure TrackingObservable Metric Scoring
Reuse tagsvoice-agentfsm-orchestrationllm-guardrailspython-platformyaml-configurationdeterministic-workflow

⚠ Needs attention