Overview › The hierarchy › Group cohesion

Group Cohesion — do a group's states work together?

Tier 2. A phase can be made of states that are each locally fine, yet collectively thrash. Cohesion is the only metric that sees that.

What & why

A group (or segment) is a contiguous functional phase of a flow — the collection phase, the confirmation phase, the action phase. The middle tier exists because collaboration is emergent: every state in a collect group can report progress=1, and yet the group as a whole re-asks across its members and burns 8 turns to fill 3 slots. No single state looks broken; the group is. Cohesion is the signal for that — “these states are present but not collaborating.”

How it works — the bounded metric

The naive approach (multiply a few rates) collapses to zero and hides which part failed. The as-shipped metric keeps three bounded, visible components, computed from the reliable per-state signals (progress, revisit, stall, dwell) already in state_scores:

goal_yield           = clamp(mean(progress) over the group's states)     # did members advance
efficiency           = clamp(n_states / sum(dwell_turns), 0, 1)          # turns vs reference
transition_coherence = clamp(1 − mean(revisit) − mean(stall), 0, 1)      # clean handoffs

group_cohesion       = goal_yield · (0.35·efficiency + 0.65·transition_coherence)   # 0..1

The 0.65 weight on transition_coherence reflects that how cleanly the states hand off matters more than raw turn-efficiency. goal_yield is the group's typed success signal (for collect groups that is members advancing; the design extends this to confirmation_success for confirm, tool_success for act, and so on).

Terminal kinds are excluded. Cohesion is only meaningful where states collaborate — a multi-state, non-terminal group. Terminal kinds (close, escalation, terminal, greeting) are endpoints where progress=0 is normal, so a progress-based goal_yield would falsely score them 0. The query filters them out, and requires n_states ≥ 2 (a one-state “group” has nothing to cohere).

Use case

Two flows both score “fine” at the state level. Cohesion separates them: ceo_command_center's 3-state action group has goal_yield 0.86 but transition_coherence 0.0 — its states advance individually but re-enter and stall as a unit, so cohesion craters to 0.2. A clean collect group like pet_store sits at 1.0. The number points you at the phase that needs a structural fix (the GoalSegment contract), not at any one state.

Example

python scripts/eval_db.py --group-cohesion        # all flows; add --flow F for one

Real output (truncated) from the live quality.db:

# group_cohesion — do a group's states work together? (0..1, higher better)

| Flow                    | Segment | states | goal_yield | efficiency | transition_coherence | **cohesion** |
|-------------------------|---------|--------|------------|------------|----------------------|--------------|
| ceo_command_center      | action  | 3      | 0.86       | 0.67       | 0.0                  | ❌ **0.2**   |
| apartment_viewing       | inform  | 3      | 0.74       | 0.43       | 0.96                 | ⚠ **0.57**  |
| property_trouble_ticket | action  | 2      | 1.0        | 1.0        | 0.5                  | ⚠ **0.68**  |
| coffee                  | collect | 3      | 1.0        | 1.0        | 0.67                 | ✅ **0.78**  |
| dog_grooming            | collect | 9      | 0.88       | 1.0        | 0.96                 | ✅ **0.86**  |
| coffee_split            | collect | 11     | 0.98       | 1.0        | 0.93                 | ✅ **0.94**  |
| austin_plumbing         | collect | 6      | 1.0        | 1.0        | 0.94                 | ✅ **0.96**  |
| pet_store               | collect | 5      | 1.0        | 1.0        | 1.0                  | ✅ **1.0**   |

goal_yield=members advanced · efficiency=turns vs reference · transition_coherence=1−revisit−stall.
Low cohesion = states present but NOT collaborating (thrash/re-ask/loop).

Deriving a group's goal — --segment-goals

A companion view answers “what is this group's goal, and is it met?” for one flow. It reads the flow YAML, groups states by segment inference, derives each group's target slots (the union of members' required_slots) and purpose (member descriptions), then joins the achievement from state_scores:

python scripts/eval_db.py --segment-goals austin_plumbing

Real output (truncated):

# austin_plumbing — groups of states, their GOAL, and whether it's met

■ GROUP 'collect' (10 states)  ✅ goal met (advances cleanly)
   goal-target: ['blocked_service_days', 'possible_service_days', 'preferred_date',
                 'preferred_time_part', 'problem_description', 'service_duration_minutes']
   goal-purpose: Capture specific date.
   achieved: slot_fill=0.22 stall=0.0 progress=1.0

■ GROUP 'enrich' (3 states)  ✅ goal met (advances cleanly)
   goal-target: ['customer_name', 'customer_phone', 'service_address']
   goal-purpose: Collect Name, Phone, and Address (Soft-Locked).
   achieved: slot_fill=0.2 stall=0.0 progress=1.0
The verdict uses the reliable signals, not slot_fill. The “good?” verdict is computed from progress and stall (derived from transitions). slot_fill is advisory only here — for front-loaded callers it reads low even when the group does fill its slots, so it must not gate the verdict (commit f34348e).

Where it fits

Group cohesion sits between per-state metrics (which it aggregates) and the flow matrix (which it explains). Roll it up one more level — the same group kind across all flows — and you get archetypes, the systemic-weakness finder.