Overview › Operations › CI quality gate

The CI Quality Gate

A per-commit regression gate that is fast and free — because it scores only the deterministic computed aspects, never an LLM judge. The expensive judged matrix is a separate nightly job.

What & why

You want to catch a flow regression before it lands, on every push. But the judged humanness matrix is slow and costs API money — you can't run it per commit. The resolution is the computed/judged split: the computed aspects (correctness, completion) are deterministic, so a per-commit gate can run them on just the changed flows, compare to the DB baseline, and fail the build on a regression — all in ~22 seconds per flow, with zero judge calls.

The cost boundary. Cheap LINT + computed-aspect gate → every commit (CI / pre-push). Expensive JUDGED matrix → nightly. This is the same boundary the flow build pipeline uses: lint+load on every edit, the judged milestone gate only while bringing a new flow up to the bar.

How it works

1. `scripts/ci_eval.sh` — the gate body

Determine which flows to gate: explicit args, else the flows touched by the last commit (tutorial/module/underscore flows excluded).
Run computed aspects only — fast-tier agent (qwen-flash), 2 personas (cooperative,skeptical), --repeat 1, --no-judge. Deterministic, no LLM judge.
Ingest the computed aspects into aspect_scores.
Run eval_db.py --gate — its exit code is the build verdict.

2. `eval_db.py --gate` — the regression check

For each flow it compares the latest run's correctness/completion to the previous run, and fails (exit 1) on either a drop > 1.0 point or an absolute below the floor (6.0). It prints a verdict line per flow:

  ✅ weather: correctness 10.0
  ❌ austin_plumbing: correctness 4.2 < floor 6.0; completion regressed 10.0->7.0

GATE: FAIL (1 flow(s) regressed)

3. The pre-push hook

scripts/hooks/pre-push gates only the flows changed in the commits being pushed (diffed against origin/main), and instant-skips when no flow YAML changed. Install it once per clone:

scripts/install_hooks.sh        # symlinks .git/hooks/pre-push -> scripts/hooks/pre-push

# bypass once when you need to:
git push --no-verify
RIFF_SKIP_EVAL=1 git push

Use case

You edit flows/coffee.yaml to reorder the collect questions and push. The hook detects the YAML change, runs the computed-aspect gate on just coffee (~22s), and finds completion dropped from 10 to 7 because your reorder stranded a conditional slot. The push is blocked with an exact verdict line — before a regression ever reaches main or the nightly judged run.

Example

# gate explicit flows
scripts/ci_eval.sh austin_plumbing weather

# or let it auto-detect flows changed in the last commit
scripts/ci_eval.sh

# the gate alone, against whatever is already in the DB
python scripts/eval_db.py --gate austin_plumbing weather

Why judge-free is the right call here

Because the gate scores only computed aspects, it has no noise floor — a fail is a real regression, not a judge having a bad night. (The judged humanness number swings ~1.5pt run to run; gating on it would produce flaky red builds.) Humanness regressions are caught by the nightly matrix and human review, not the blocking gate. See Methodology for the noise-floor evidence.

Where it fits

The CI gate is the per-commit floor of the whole system. It reuses the exact same --ingest-aspects path as the matrix, so what it gates and what the matrix reports are the same numbers. Above it sit the nightly judged matrix and the replay regression check.

Source: scripts/ci_eval.sh, scripts/hooks/pre-push, scripts/install_hooks.sh, scripts/eval_db.py (gate).

The CI Quality Gate

What & why

How it works

1. scripts/ci_eval.sh — the gate body

2. eval_db.py --gate — the regression check

3. The pre-push hook

Use case

Example

Why judge-free is the right call here

Where it fits

1. `scripts/ci_eval.sh` — the gate body

2. `eval_db.py --gate` — the regression check