Overview › Operations › CI quality gate

The CI Quality Gate

A per-commit regression gate that is fast and free — because it scores only the deterministic computed aspects, never an LLM judge. The expensive judged matrix is a separate nightly job.

What & why

You want to catch a flow regression before it lands, on every push. But the judged humanness matrix is slow and costs API money — you can't run it per commit. The resolution is the computed/judged split: the computed aspects (correctness, completion) are deterministic, so a per-commit gate can run them on just the changed flows, compare to the DB baseline, and fail the build on a regression — all in ~22 seconds per flow, with zero judge calls.

The cost boundary. Cheap LINT + computed-aspect gate → every commit (CI / pre-push). Expensive JUDGED matrix → nightly. This is the same boundary the flow build pipeline uses: lint+load on every edit, the judged milestone gate only while bringing a new flow up to the bar.

How it works

1. scripts/ci_eval.sh — the gate body

  1. Determine which flows to gate: explicit args, else the flows touched by the last commit (tutorial/module/underscore flows excluded).
  2. Run computed aspects only — fast-tier agent (qwen-flash), 2 personas (cooperative,skeptical), --repeat 1, --no-judge. Deterministic, no LLM judge.
  3. Ingest the computed aspects into aspect_scores.
  4. Run eval_db.py --gate — its exit code is the build verdict.

2. eval_db.py --gate — the regression check

For each flow it compares the latest run's correctness/completion to the previous run, and fails (exit 1) on either a drop > 1.0 point or an absolute below the floor (6.0). It prints a verdict line per flow:

  ✅ weather: correctness 10.0
  ❌ austin_plumbing: correctness 4.2 < floor 6.0; completion regressed 10.0->7.0

GATE: FAIL (1 flow(s) regressed)

3. The pre-push hook

scripts/hooks/pre-push gates only the flows changed in the commits being pushed (diffed against origin/main), and instant-skips when no flow YAML changed. Install it once per clone:

scripts/install_hooks.sh        # symlinks .git/hooks/pre-push -> scripts/hooks/pre-push

# bypass once when you need to:
git push --no-verify
RIFF_SKIP_EVAL=1 git push

Use case

You edit flows/coffee.yaml to reorder the collect questions and push. The hook detects the YAML change, runs the computed-aspect gate on just coffee (~22s), and finds completion dropped from 10 to 7 because your reorder stranded a conditional slot. The push is blocked with an exact verdict line — before a regression ever reaches main or the nightly judged run.

Example

# gate explicit flows
scripts/ci_eval.sh austin_plumbing weather

# or let it auto-detect flows changed in the last commit
scripts/ci_eval.sh

# the gate alone, against whatever is already in the DB
python scripts/eval_db.py --gate austin_plumbing weather

Why judge-free is the right call here

Because the gate scores only computed aspects, it has no noise floor — a fail is a real regression, not a judge having a bad night. (The judged humanness number swings ~1.5pt run to run; gating on it would produce flaky red builds.) Humanness regressions are caught by the nightly matrix and human review, not the blocking gate. See Methodology for the noise-floor evidence.

Where it fits

The CI gate is the per-commit floor of the whole system. It reuses the exact same --ingest-aspects path as the matrix, so what it gates and what the matrix reports are the same numbers. Above it sit the nightly judged matrix and the replay regression check.