RIFF · Product Status

Path to Revenue

What it takes to turn RIFF — a working AI voice agent — into a paid after-hours answering service, priced like a night-shift worker: $1 / hour on shift (8pm–8am = $12 a night, unlimited calls).

Status update · 2026-06-18 · figures illustrative, pending verified vendor pricing

RIFF already answers real phone calls: Telnyx routes the call, Gemini Live handles the voice, and a deterministic state machine drives the conversation through YAML-defined flows (booking, inquiries, message-taking). The conversation engine is production-grade. The business wrapper around it — billing, multi-tenant accounts, message hand-off, secrets/auth — is the work that stands between today and the first paying customer.

The model: a night-shift worker at $1/hour

You hire RIFF for a shift the way you'd hire a person — $1 for every hour it's on the line, billed for the window, not per call or per minute. A typical 8pm–8am shift is 12 hours = $12. The customer pays for coverage (hours staffed); RIFF only spends money on the talk minutes when calls actually come in. That gap is the margin.

$12

a full 8pm–8am night, unlimited calls

~$0.15

illustrative cost per coverage-hour VERIFY

~35 min

talk-minutes per staffed hour before $1/hr loses money

At moderate volume the margin is large; the live risk is a single line that rings nonstop. The companion spreadsheet (riff_cost_per_hour.csv) recalculates all of this from real numbers as soon as vendor prices are filled in.

How $12/night sits against the market

No competitor prices per shift-hour — the "hire a worker for the night" framing is open whitespace. The two flanks:

Alternative	What you pay	Catch
Live human / answering service	$250–$720/mo, or $15–25/hr	Per-minute meter + night surcharge; one 10-min 2am call ≈ $20–30
Budget AI bots (Rosie, Dialzara)	$29–$49/mo	Cheaper sticker, but cap you at 60–250 min/mo
RIFF	$1/hr ($12/night)	Dedicated coverage for the window, unlimited calls

Positioning: not the cheapest sticker — the cheapest dedicated worker. The value line: one after-hours call to a human service can cost a whole RIFF night. Lead with reliability (won't hallucinate, takes accurate messages), not price — research shows "too cheap" reads as "untrustworthy" for calls people care about. Competitor pricing verified on vendor sites, June 2026.

Where RIFF stands today

Capability	Status	Notes
Voice conversation engine	Ready	Telnyx + Gemini Live + deterministic FSM, ~4,800 tests
Flow authoring (YAML)	Ready	Booking, inquiry, message-taking flows exist
Calendar / SMS providers	Ready	Pluggable, degrade gracefully when down
Secrets & config	Partial	Local `.env` (not committed); needs a prod secret store
Observability	Partial	On-disk logs; no central dashboard or alerts
Human message hand-off	Stub	The core answering-service job — must be finished
Multi-tenant accounts	Missing	Single business / number / calendar today
Auth on control plane	Missing	Call transcripts readable without auth
Usage metering & billing	Missing	Tokens measured internally; no per-customer billing

P0 Blockers — can't take money without these

Verify unit economicsFill the cost model with real Telnyx + Gemini Live prices. Confirms $1/hr clears cost. Everything else depends on this answer.
Multi-tenancyA tenant_id on every session, per-tenant config (flow, business name, calendar, phone-number routing), tenant-scoped storage. Serve customer #2 without a code change.
Secrets off the laptopProd secret store + key rotation; stop reading a shared dev .env in production.
Auth on the web / control planeBearer-token auth; scope every transcript read by tenant. Today anyone on the network can read calls.
Finish human message hand-offReliably take a message and deliver it (SMS/email) or escalate emergencies. This is the actual product, not a nice-to-have.

P1 Required for a reliable paid service

Graceful shutdown & call drainingA deploy mid-call must not drop a customer. Add SIGTERM/SIGINT handlers that drain active calls.
Observability you can debug at 3am/healthz, structured logs shipped somewhere queryable, alerts on call errors and degraded providers.
Usage metering → billingPer-tenant call-minute accounting; wire Stripe. Can't bill hours you can't count.
Concurrency / load sheddingDecide behavior past the call ceiling (currently hardcoded at 5): queue, busy, or scale.
CI gateRun the ~4,800-test suite on every PR so "tests pass" is enforced, not assumed.

P2 Scale & polish — after first customers

Deployment artifactReal Dockerfile + compose so a new box isn't manual.
Distributed session storeMove SQLite → Postgres for multi-instance HA.
Self-serve tenant onboardingA flow-authoring UI so you're not hand-writing YAML per customer.
ComplianceCall-recording consent, data retention, PII handling. Varies by state.

Cost-optimization lever: Qwen cascade

Today's voice path uses Gemini Live native audio — the most natural-sounding option, but priced on audio tokens (the dominant cost). An alternative path already partly built in RIFF: a cascade of speech-to-text → text LLM → text-to-speech.

Qwen is already wired (riff/adapters/dashscope.py) — used today only for evaluation, not live calls.
A cascade could run cheap local TTS (tts_server.py, Piper ≈ free) + local STT (Whisper) + a Qwen text model via DashScope. Text tokens cost an order of magnitude less than native-audio tokens, and local TTS removes the priciest line entirely.
Trade-off: higher turn latency and a slightly more robotic voice than Gemini Live — generally acceptable for nighttime message-taking, less so for fast back-and-forth.
Note: "Qwen Max" is a text model — it replaces the brain, not the ears/mouth. A cheaper Qwen tier likely suffices for a constrained answering flow. Swapping in Qwen means committing to the cascade architecture, which isn't yet the primary phone path.

NEXT Model both paths side-by-side in the cost sheet (Scenario A: Gemini Live · Scenario B: Qwen cascade) once live Qwen/DashScope pricing is confirmed.

Immediate next step

Pull live vendor pricing (Telnyx inbound per-minute, Gemini Live audio tokens, Qwen/DashScope tokens) into riff_cost_per_hour.csv. That single number — cost per coverage-hour — tells us whether $1/hr is a business or a hobby, and whether the Qwen cascade is worth building.