Droid

A local text-to-speech experiment for android-style voice design. The first pass compares core DSP/vocoder paths; the new Space Channel pass tests ten network-operations droid voices against the same prompt set.

10Space Channel presets
66playable samples
0.5sA/B/D generation, typical
localmacOS, Piper, NumPy/SciPy

Space Channel Droid Voices

Ten new presets tuned from the current MVP feedback: keep the metallic a_dsp character, reduce scratch, and push toward a Euro/protocol network-operations voice.

Open voice switcher
retirement_party_mvp space_ops_switchboard uap_archive_vocoder bug_intake_clerk deep_space_radio

Genesis Of The Variants

a_dsp - Processed Speech Droid

This began as the fastest baseline: synthesize clean Piper speech, then make the voice mechanical with ring modulation, comb filtering, bit-crush, tone leak, and soft clipping. It keeps the words clear because the human-like TTS remains the dominant layer.

b_vocoder - Carrier-Synthesis Droid

This variant came from the classic robot-vocoder idea: extract the speech envelope and use it to drive a saw/square synthetic carrier. It is the most machine-generated path, with stronger droid character and less natural intelligibility.

c_light - Premium Android

This tests the opposite hypothesis: start from a higher-quality Piper voice and apply only subtle metallic treatment. It is meant to sound like a cleaner protocol-style android rather than a harsh machine.

d_hybrid - Vocoder-Forward Hybrid

This was retuned after the first blend sounded too much like a_dsp. The new version makes the vocoder the primary layer and mixes in a smaller processed-speech layer only to recover clarity. It should sit closer to b_vocoder than to A.

Listen

Affirmative. All systems nominal. Awaiting your command.
Variant
Audio
Generated
a_dsp
0.504s
b_vocoder
0.537s
c_light
0.891s
d_hybrid
0.544s
I am a synthetic intelligence. State your designation.
Variant
Audio
Generated
a_dsp
0.501s
b_vocoder
0.525s
c_light
0.820s
d_hybrid
0.530s
Directive seven one four acknowledged. Power cell at forty two percent.
Variant
Audio
Generated
a_dsp
0.525s
b_vocoder
0.553s
c_light
0.942s
d_hybrid
0.568s
I do not experience fear, but I recognize the urgency of your request.
Variant
Audio
Generated
a_dsp
0.530s
b_vocoder
0.575s
c_light
1.013s
d_hybrid
0.587s

What This Shows

The effects are cheap. The generation time is dominated by launching Piper and synthesizing speech. A, B, and D all share the same Piper render, so they cluster around the same latency; B and D add only a few dozen milliseconds for vocoder-band processing. C is slower because it uses a higher-quality Piper voice.

For sub-200 ms replies, the practical path is to pre-generate common short responses and keep live synthesis for dynamic text. Full timing data is available in timings.csv.