Overview › The hierarchy › Archetypes

Archetypes — the same subpart across all flows

Tier 4. Stop asking “is this flow's confirm phase weak?” and start asking “are our confirm phases weak everywhere?”

What & why

Every flow has a collect phase, most have a confirm phase, several have an action phase. Those are the same functional subpart wearing different state names. An archetype groups all of them into one bucket — every flow's collect, every confirm, every action — and metrics it library-wide. That reveals systemic weakness: if the action archetype is weak on average, the fix is the pattern, not one flow. It is the difference between “fix this bug” and “fix this class of bug.”

How it works — segment inference, then canonicalization

Two steps. First, each state is assigned a segment (riff/flow_eval/segments.py) by a precedence rule: an authored segment: on the state wins; else its lifecycle_phase; else the speech_act family (ask/listen/static → collect, confirm → confirm, act → action, and so on). So grouping needs no authoring by default, but an authored segment: gives stable IDs for trend analysis.

Second, because flows label inconsistently (one tags by lifecycle phase triage/book/enrich, another by speech-act family collect/action), eval_db.py applies a canonical archetype taxonomy (_CANON_ARCHETYPE) that folds the variants into one shared vocabulary — enrich/gather → collect, book/dispatch/commit → action, readback → confirm. Without this, austin's book and enrich would sit in lonely one-flow buckets instead of joining the rest of the library's action and collect phases (commit 4cd7998). Each archetype carries a typed goal (_ARCHETYPE_GOAL): collect → slot_yield, confirm → confirmation_success, action → tool_success.

Use case

You are deciding what to invest in next. Per-flow, a dozen flows each have a slightly weak phase — noise, or a pattern? The archetype view answers it: if collect averages 0.92 across 18 flows but action averages 0.62 across 3, the leverage is in the action pattern (the backend/booking handoff), and a single structural fix there lifts every action-bearing flow at once.

Example

python scripts/eval_db.py --archetype

Real output from the live quality.db:

# Cross-flow ARCHETYPES — the same FSM subpart across ALL flows (worst archetype first)

| Archetype   | goal                              | flows | mean cohesion | spread (worst→best flow)               |
|-------------|-----------------------------------|-------|---------------|----------------------------------------|
| **inform**  | convey info + advance             | 1     | ⚠ 0.57       | apartment_viewing 0.57 → 0.57          |
| **action**  | a tool/program succeeds (tool_…)  | 3     | ⚠ 0.62       | ceo_command_center 0.2 → weather 1.0   |
| **collect** | fill the target slots (slot_yield)| 18    | ✅ 0.92      | coffee 0.78 → austin_plumbing 1.0      |

mean cohesion = how well this subpart-type collaborates AVERAGED across all flows.
A low archetype mean = a SYSTEMIC weakness (fix the pattern, not one flow).

The spread column is as useful as the mean: action ranges from ceo_command_center at 0.2 to weather at 1.0, which tells you the weakness is concentrated, not uniform — so the worst flow is the place to learn the fix that generalizes.

Where it fits

Archetypes sit one level above group cohesion — they reuse the exact same per-(flow, segment) cohesion rows (_cohesion_rows) and average them by canonical archetype. Read group cohesion to fix this flow; read archetypes to decide which class of phase deserves a structural investment across the whole library.

Source: scripts/eval_db.py (archetype_cohesion, _CANON_ARCHETYPE, _ARCHETYPE_GOAL), riff/flow_eval/segments.py.