Group Cohesion — do a group's states work together?
Tier 2. A phase can be made of states that are each locally fine, yet collectively thrash. Cohesion is the only metric that sees that.
What & why
A group (or segment) is a contiguous functional phase of a flow — the
collection phase, the confirmation phase, the action phase. The middle tier exists because
collaboration is emergent: every state in a collect group can report progress=1,
and yet the group as a whole re-asks across its members and burns 8 turns to fill 3 slots. No single
state looks broken; the group is. Cohesion is the signal for that — “these states
are present but not collaborating.”
How it works — the bounded metric
The naive approach (multiply a few rates) collapses to zero and hides which part failed. The
as-shipped metric keeps three bounded, visible components, computed from the reliable
per-state signals (progress, revisit, stall, dwell)
already in state_scores:
goal_yield = clamp(mean(progress) over the group's states) # did members advance efficiency = clamp(n_states / sum(dwell_turns), 0, 1) # turns vs reference transition_coherence = clamp(1 − mean(revisit) − mean(stall), 0, 1) # clean handoffs group_cohesion = goal_yield · (0.35·efficiency + 0.65·transition_coherence) # 0..1
The 0.65 weight on transition_coherence reflects that how cleanly the states hand
off matters more than raw turn-efficiency. goal_yield is the group's typed success
signal (for collect groups that is members advancing; the design extends this to
confirmation_success for confirm, tool_success for act, and so on).
close, escalation, terminal,
greeting) are endpoints where progress=0 is normal, so a progress-based
goal_yield would falsely score them 0. The query filters them out, and requires
n_states ≥ 2 (a one-state “group” has nothing to cohere).
Use case
Two flows both score “fine” at the state level. Cohesion separates them:
ceo_command_center's 3-state action group has goal_yield 0.86
but transition_coherence 0.0 — its states advance individually but re-enter and stall as
a unit, so cohesion craters to 0.2. A clean collect group like
pet_store sits at 1.0. The number points you at the phase that needs a
structural fix (the GoalSegment contract), not at
any one state.
Example
python scripts/eval_db.py --group-cohesion # all flows; add --flow F for one
Real output (truncated) from the live quality.db:
# group_cohesion — do a group's states work together? (0..1, higher better) | Flow | Segment | states | goal_yield | efficiency | transition_coherence | **cohesion** | |-------------------------|---------|--------|------------|------------|----------------------|--------------| | ceo_command_center | action | 3 | 0.86 | 0.67 | 0.0 | ❌ **0.2** | | apartment_viewing | inform | 3 | 0.74 | 0.43 | 0.96 | ⚠ **0.57** | | property_trouble_ticket | action | 2 | 1.0 | 1.0 | 0.5 | ⚠ **0.68** | | coffee | collect | 3 | 1.0 | 1.0 | 0.67 | ✅ **0.78** | | dog_grooming | collect | 9 | 0.88 | 1.0 | 0.96 | ✅ **0.86** | | coffee_split | collect | 11 | 0.98 | 1.0 | 0.93 | ✅ **0.94** | | austin_plumbing | collect | 6 | 1.0 | 1.0 | 0.94 | ✅ **0.96** | | pet_store | collect | 5 | 1.0 | 1.0 | 1.0 | ✅ **1.0** | goal_yield=members advanced · efficiency=turns vs reference · transition_coherence=1−revisit−stall. Low cohesion = states present but NOT collaborating (thrash/re-ask/loop).
Deriving a group's goal — --segment-goals
A companion view answers “what is this group's goal, and is it met?” for one flow. It
reads the flow YAML, groups states by segment inference,
derives each group's target slots (the union of members' required_slots) and purpose
(member descriptions), then joins the achievement from state_scores:
python scripts/eval_db.py --segment-goals austin_plumbing
Real output (truncated):
# austin_plumbing — groups of states, their GOAL, and whether it's met
■ GROUP 'collect' (10 states) ✅ goal met (advances cleanly)
goal-target: ['blocked_service_days', 'possible_service_days', 'preferred_date',
'preferred_time_part', 'problem_description', 'service_duration_minutes']
goal-purpose: Capture specific date.
achieved: slot_fill=0.22 stall=0.0 progress=1.0
■ GROUP 'enrich' (3 states) ✅ goal met (advances cleanly)
goal-target: ['customer_name', 'customer_phone', 'service_address']
goal-purpose: Collect Name, Phone, and Address (Soft-Locked).
achieved: slot_fill=0.2 stall=0.0 progress=1.0
progress and stall
(derived from transitions). slot_fill is advisory only here — for front-loaded callers
it reads low even when the group does fill its slots, so it must not gate the verdict (commit
f34348e).
Where it fits
Group cohesion sits between per-state metrics (which it aggregates) and the flow matrix (which it explains). Roll it up one more level — the same group kind across all flows — and you get archetypes, the systemic-weakness finder.