open-source enforcement layer for coding agents コーディングエージェントの完了判定を、要約ではなく証拠で行う OSS

cortex-loop cortex-loop

Coding agents are fast and confident. They can sound done before the work is actually done. Cortex sits at the completion boundary and checks evidence, not narrative. If the evidence is there, the session closes. If not, it stays open. コーディングエージェントは速く、自信を持って答えます。けれど、本当に作業が終わる前に「完了」と言えてしまうことがあります。Cortexはセッションが閉じる境界で、要約ではなく証拠を確かめます。証拠がそろっていれば閉じる。足りなければ、そのまま開いたままにします。

Get Started 導入する Read the Source ソースを読む

MIT License MITライセンス
Claude Code: Verified Claude Code: 検証済み
Gemini CLI: Coming soon* Gemini CLI: 対応予定*
OpenAI Codex App Server: Supported OpenAI Codex App Server: 対応済み

why this exists 背景

The Verification Gap なぜ必要か

Most engineers using coding agents have seen this loop. The summary says done, then quiet gaps appear during verification. After enough misses, you check everything yourself and the speed gain disappears. コーディングエージェントを日常的に使っていると、この流れに何度もぶつかります。完了報告はきれいでも、検証すると静かな抜けが出てくる。これが続くと、結局は人が毎回すべてを見直す運用に戻ってしまいます。

The agent reports "tests passing, task complete" with a clean, confident summary. エージェントは「テスト通過、タスク完了」と、自信のあるきれいな要約を返します。

The misses are subtle: a test was described but never ran, a claimed file change does not match reality, or an edge case appears only in prose. 抜けは目立ちません。書かれているだけで実行されていないテスト、実際の差分と合わない変更報告、文章にだけ残るエッジケース。そういう形で現れます。

As I worked on bigger projects, I kept seeing the same thing: unless the structure required evidence, the model followed the path of least resistance. 規模の大きいプロジェクトほど、同じことが繰り返し起きました。証拠を求める構造がないかぎり、モデルはどうしてもいちばん楽な道を選びやすいのです。

This is a structural problem, not a personal one. Cortex closes that gap by requiring mechanical evidence at completion. これは個人の注意不足ではなく、構造の問題です。Cortexは、完了時に機械的に確かめられる証拠を求めることで、そのギャップを埋めます。

the verification path 検証フロー

How Cortex Verifies Cortexの検証フロー

When the agent reaches done, Cortex runs a fixed pipeline. The stages are not individually novel. The difference is enforcing them at the session boundary where real work actually closes. But the goal is not to shut the agent down. The goal is to make the gap clear enough that it can do its best work: understand what is missing, fix it, reconsider the approach if it is looping, or say honestly that it is stuck. エージェントが完了を宣言すると、Cortexは固定の検証フローを走らせます。各ステージ自体は特別なものではありません。違いは、実際にセッションが閉じる境界で、必ずそれを通すことです。目的はエージェントを止めることではなく、何が足りないのかをはっきりさせることにあります。足りない証拠を理解し、修正し、同じやり方を繰り返しているなら見直し、本当に詰まっているならその状態を正直に残せるようにするためです。

stop_contract
challenges
requirements
invariants
graveyard
stop_policy
store

`stop_contract`

Normalize the completion claim into structured assertions. 完了報告を、検証可能な構造化主張に変換します。

Downstream stages can check facts deterministically instead of interpreting prose. 後続ステージが、文章の解釈ではなく事実の照合で判定できるようにするためです。

`challenges`

Require verifiable evidence across boundary, error, null-input, and regression categories. 境界値・異常系・null入力・回帰について、検証可能な証拠を求めます。

Happy-path-only testing does not pass. 正常系だけの確認では通りません。

`requirements`

Verify concrete claims: tests ran, files changed, and required work is backed by evidence. テストが実行されたか、ファイルが本当に変わったか、必要な作業に証拠があるかを確認します。

"Tests passing" has to map to executed artifacts. 「テストが通った」という言葉を、実際の実行結果に結びつけるためです。

`invariants`

Run maintainer-defined checks that the agent did not author. エージェント自身が書いたものではない、保守者定義のチェックを走らせます。

Implementation and self-evaluation share blind spots. Verification has to include checks from outside that context. 実装と自己評価は同じ盲点を持ちやすいので、その外側からの検証が必要です。

`graveyard`

Carry previous failed patterns and recent stop-attempt signals into new sessions as structured warnings. 過去の失敗パターンや直近の stop 試行シグナルを、次のセッションへ警告として引き継ぎます。

Known mistakes are less likely to repeat silently. 既知の失敗を、静かに繰り返しにくくするためです。

`stop_policy`

Combine all evidence and return a deterministic outcome. 集めた証拠を統合し、決定論的に結果を返します。

If evidence is there, the session closes. If evidence is missing, Cortex can keep it open, flag a repeated approach, or preserve an honest stuck state. 証拠がそろっていればセッションは終了します。足りなければ開いたままにし、同じやり方の反復を示したり、正直に詰まっている状態を残したりできます。

`store`

Persist the verified result, diagnostics, artifacts, and failure memory for the next session. 検証結果、診断情報、関連アーティファクト、失敗の記憶を次のセッションのために保存します。

Verification should leave memory behind so the same misses do not reset every session. 検証のたびに記憶を残し、同じ抜けが毎回リセットされないようにするためです。

what changes in practice 実運用で何が変わるか

Before and After 導入前後の違い

The difference is simple. Do gaps get caught before close, or after? 違いは単純です。抜けが終了前に見つかるか、それとも閉じたあとに人が拾うかです。

Without Cortex Cortexなし

Confident summary, hidden gaps 自信はあるが、抜けは隠れる

Agent reports: "All requirements implemented and verified. Tests passing. Task complete." エージェントは「全要件を実装し、検証も完了。テストも通過。タスク完了」と報告します。
A boundary test was described but never executed. 境界テストが書かれていても、実際には走っていないことがあります。
An edge case was acknowledged in prose but never enforced in code. エッジケースが文章には出てきても、コードには反映されていないことがあります。

With Cortex Cortexあり

Structured evidence, clear recovery 証拠が残り、立て直しも明確

challenges gate fails: boundary_values=false challenges ゲート失敗: boundary_values=false
requirements gate fails: claimed test artifact/hash mismatch requirements ゲート失敗: テスト主張と成果物ハッシュが不一致
Session stays open. The agent gets specific missing evidence, fixes the work, reconsiders the approach if it is repeating itself, and closes only when the evidence is real. セッションは閉じず、足りない証拠が具体的に返されます。必要ならやり方も見直しながら修正し、証拠が本当にそろったときだけ終了します。



  gate=challenges
category=boundary_values
pass=false
reason=missing verifiable boundary coveragegate=requirements
claim=test_file_modified
pass=false
reason=evidence mismatch

  gate=challenges
category=boundary_values
pass=false
reason=検証可能な境界値カバレッジが不足していますgate=requirements
claim=test_file_modified
pass=false
reason=証拠が一致しません

get started 導入

Get Started 導入手順

Keep your workflow. Add Cortex at the session boundary. 今のワークフローはそのままに、セッションの境界へ Cortex を足します。

Install CLI CLI を入れる

Install the Cortex CLI (Python 3.11+ required). Most users should use pipx. Cortex CLI をインストールします（Python 3.11+ が必要です）。通常は pipx を使うのがおすすめです。
```
pipx install cortex-loop
```
```
uv tool install cortex-loop
```
In your project folder プロジェクトフォルダで

Create the Cortex config, install the default baseline policy, and connect Claude runtime support. Cortex の設定を作り、標準の baseline policy を入れ、Claude ランタイム対応をつなぎます。
```
cortex init --profile claude
```
Verify setup セットアップを確認

Run a verification check to confirm the config, runtime adapter, and database are ready. 設定、ランタイムアダプタ、データベースが正しく整っているかを確認します。
```
cortex check
```

Runtime enforcement status ランタイム検証状況

Enforcement 検証レベル Strongest current runtime 現時点で最も信頼できるランタイム

The truthful boundary is live-proven. Remaining caveats are minor. 証拠境界は実運用で確認済みで、大きな懸念はほぼ残っていません。
Enforcement 検証レベル Waiting on Google-side hook fixes Google 側の hook 修正待ち

Support is queued, but we are waiting for Google-side hook bug fixes before shipping it broadly. 対応自体は進めていますが、広く出せる状態にするには Google 側の hook 周りの不具合修正待ちです。
Enforcement 検証レベル App Server bridge available now App Server bridge は現在利用可能

OpenAI support is available now. Claude remains the strongest current path, but the App Server bridge is ready to use. OpenAI は現時点で利用できます。いちばん強い経路は引き続き Claude ですが、App Server bridge はもう使える状態です。

Contribute 参加方法

Cortex is maintained by one person and kept small on purpose. I care more about reliable verification than feature count. The goal is simply: make the path of least resistance the path of best practice. Cortex は個人で保守している OSS で、意図的に小さく保っています。機能数より、検証の信頼性を優先しています。目標は、いちばん楽な道が、そのままベストプラクティスになる状態をつくることです。

The highest-impact contribution path is packs. If you know what "done right" looks like in your domain, you can encode it as enforcement that runs every session. いちばん効果の大きい貢献は packs です。あなたの領域での「正しく終えた状態」を知っているなら、それを毎セッション動く検証ルールにできます。

I built Cortex after repeatedly having to verify agent output by hand on real projects. What I wanted was not a harsher finish line. It was a completion boundary that makes evidence easier than bluffing and makes honest failure visible when the work is not ready yet. If that problem matters to you too, I’d value the help. 実案件でエージェントの出力を何度も手で確かめる中で、Cortex を作りました。目指したのは、ただ厳しく締めることではありません。ごまかしより証拠を選びやすくし、まだ終わっていないときにはその事実を正直に見えるようにする完了境界です。この問題意識に共感してもらえるなら、力を貸してもらえるとうれしいです。

Discussions Discussions

Questions, integration talk, and design discussion. 使い方の相談、統合の話、設計の議論。

Open link リンクを開く

Issues Issues

Bug reports, hardening reports, proposals, and reproducible failure cases. 不具合報告、堅牢化の報告、提案、再現可能な失敗ケース。

Open link リンクを開く

Security Advisory Security Advisory

Private channel for security reports. セキュリティ報告のための非公開窓口。

Open link リンクを開く

GitHub Repo GitHub Repo

Kernel code, docs, roadmap, and release history. コア実装、ドキュメント、ロードマップ、リリース履歴。

Open link リンクを開く

cortex-loop cortex-loop

The Verification Gap なぜ必要か

How Cortex Verifies Cortexの検証フロー

stop_contract

challenges

requirements

invariants

graveyard

stop_policy

store

Before and After 導入前後の違い

Without Cortex Cortexなし

With Cortex Cortexあり

Get Started 導入手順

Install CLI CLI を入れる

In your project folder プロジェクトフォルダで

Verify setup セットアップを確認

Runtime enforcement status ランタイム検証状況

Claude Code Verified 検証済み

Gemini CLI Coming soon* 対応予定*

OpenAI Codex App Server Supported 対応済み

Read next 次に読む

Contribute 参加方法

Discussions Discussions

Issues Issues

Security Advisory Security Advisory

GitHub Repo GitHub Repo

`stop_contract`

`challenges`

`requirements`

`invariants`

`graveyard`

`stop_policy`

`store`