Archetypes — the same subpart across all flows
Tier 4. Stop asking “is this flow's confirm phase weak?” and start asking “are our confirm phases weak everywhere?”
What & why
Every flow has a collect phase, most have a confirm phase, several have an action phase. Those are
the same functional subpart wearing different state names. An archetype
groups all of them into one bucket — every flow's collect, every confirm,
every action — and metrics it library-wide. That reveals systemic
weakness: if the action archetype is weak on average, the fix is the pattern,
not one flow. It is the difference between “fix this bug” and “fix this class of
bug.”
How it works — segment inference, then canonicalization
Two steps. First, each state is assigned a segment
(riff/flow_eval/segments.py) by a precedence rule: an authored segment: on
the state wins; else its lifecycle_phase; else the speech_act family
(ask/listen/static → collect,
confirm → confirm, act → action, and so
on). So grouping needs no authoring by default, but an authored segment: gives stable
IDs for trend analysis.
Second, because flows label inconsistently (one tags by lifecycle phase
triage/book/enrich, another by speech-act family
collect/action), eval_db.py applies a
canonical archetype taxonomy (_CANON_ARCHETYPE) that folds the variants
into one shared vocabulary — enrich/gather → collect,
book/dispatch/commit → action,
readback → confirm. Without this, austin's book and
enrich would sit in lonely one-flow buckets instead of joining the rest of the library's
action and collect phases (commit 4cd7998). Each archetype
carries a typed goal (_ARCHETYPE_GOAL): collect → slot_yield,
confirm → confirmation_success, action → tool_success.
Use case
You are deciding what to invest in next. Per-flow, a dozen flows each have a slightly weak phase —
noise, or a pattern? The archetype view answers it: if collect averages 0.92 across 18
flows but action averages 0.62 across 3, the leverage is in the action pattern (the
backend/booking handoff), and a single structural fix there lifts every action-bearing flow at once.
Example
python scripts/eval_db.py --archetype
Real output from the live quality.db:
# Cross-flow ARCHETYPES — the same FSM subpart across ALL flows (worst archetype first) | Archetype | goal | flows | mean cohesion | spread (worst→best flow) | |-------------|-----------------------------------|-------|---------------|----------------------------------------| | **inform** | convey info + advance | 1 | ⚠ 0.57 | apartment_viewing 0.57 → 0.57 | | **action** | a tool/program succeeds (tool_…) | 3 | ⚠ 0.62 | ceo_command_center 0.2 → weather 1.0 | | **collect** | fill the target slots (slot_yield)| 18 | ✅ 0.92 | coffee 0.78 → austin_plumbing 1.0 | mean cohesion = how well this subpart-type collaborates AVERAGED across all flows. A low archetype mean = a SYSTEMIC weakness (fix the pattern, not one flow).
The spread column is as useful as the mean: action ranges from
ceo_command_center at 0.2 to weather at 1.0, which tells you the weakness is
concentrated, not uniform — so the worst flow is the place to learn the fix that generalizes.
Where it fits
Archetypes sit one level above group cohesion — they
reuse the exact same per-(flow, segment) cohesion rows (_cohesion_rows) and average them
by canonical archetype. Read group cohesion to fix this flow; read archetypes to decide
which class of phase deserves a structural investment across the whole library.