Browser Text To Speech TTS Realtime

A client-side, real-time neural text-to-speech engine using WebAssembly and ONNX.

https://github.com/davidbmar/Browser-Text-to-Speech-TTS-Realtime  ·  public  ·  shipped

Browser Text To Speech TTS Realtime screenshot

What it is

This project is a high-performance, browser-based text-to-speech system that runs entirely on the client. It utilizes VITS neural TTS models compiled to WebAssembly via ONNX Runtime, enabling streaming audio playback, multi-core parallel processing, and offline capability without server costs or API keys.

Features

Quickstart

git clone https://github.com/davidbmar/Browser-Text-to-Speech-TTS-Realtime.git
cd Browser-Text-to-Speech-TTS-Realtime
npm install
npm run dev

Architecture

flowchart TD
    User[User Interface]
    Hook[useTTS Hook]
    Engine[TTS Engine Class]
    Splitter[Sentence Splitter]
    ModelLoader[Model Loader]
    WASM[WebAssembly Runtime]
    ONNX[ONNX Inference]
    AudioPlayer[HTML Audio Element]
    Cache[IndexedDB Cache]
    User --> Hook
    Hook --> Engine
    Engine --> Splitter
    Engine --> ModelLoader
    ModelLoader --> Cache
    ModelLoader --> WASM
    WASM --> ONNX
    ONNX --> AudioPlayer
    AudioPlayer --> User

How it's built

The application is built with React 18, TypeScript, and Vite. It leverages the @diffusionstudio/vits-web library for model inference. The UI is styled with Tailwind CSS and shadcn/ui components. State management for TTS operations is handled by a custom useTTS hook that interfaces with a TTSEngine class, which manages sentence splitting, chunk generation, and audio playback synchronization.

How it runs

sequenceDiagram
    participant User
    participant Component
    participant Hook as useTTS
    participant Engine as TTSEngine
    participant WASM as WebAssembly
    participant Audio as HTMLAudioElement
    User->>Component: Click Speak
    Component->>Hook: speak text
    Hook->>Engine: process text
    Engine->>Engine: split into sentences
    loop For each sentence
        Engine->>WASM: generate audio chunk
        WASM-->>Engine: return audio blob
        Engine->>Audio: play chunk
        Audio-->>User: output sound
    end
    Engine-->>Hook: update progress
    Hook-->>Component: update state
    Component-->>User: render progress

How to apply & reuse

Integrate the useTTS hook into any React component to add voice capabilities. Configure voice ID, speed, and volume via hook options. Use the returned speak, stop, pause, and resume methods to control playback. Monitor state variables like isReady, isPlaying, and progress to update UI elements dynamically.

At a glance

CapabilitiesReal-time neural speech synthesisParallel multi-core processingStreaming audio playbackOffline model cachingVoice selection and configurationPlayback control pause resume stop
ComponentsuseTTS React hookTTSEngine classVITS WebAssembly modelsSentence splitter utilityshadcn UI componentsTailwind CSS styles
TechTypeScriptReactViteWebAssemblyONNX RuntimeTailwind CSSshadcn ui
Depends on@diffusionstudio/vits-webreacttypescriptvitetailwindcssclsxtailwind-merge
Integrates withBrowser IndexedDBWeb Audio APIWeb Workers
PatternsCustom React HooksClass based engine logicStreaming data processingClient side cachingComponent composition
Reuse tagsttswebassemblyonnxreactclient-side-aioffline-firstprivacy-focused

⚠ Needs attention