Open-source · MIT · v0.12.15 — Desktop App is here

Agents that
think in tiers.

One prompt becomes a self-organizing hierarchy of AI agents that plan, delegate, and execute in parallel — auto-routing every step to the cheapest model that's best at it. Up to 90% cheaper than running everything on one frontier model.

$ npm install -g cascade-ai click to copy ✓ copied
or download the desktop app
fetching latest release…
90%
cheaper vs. all-frontier
3
tiers, fully parallel
6
model providers
0
config to start

Three tiers. One coherent output.

Every prompt flows through a hierarchy. T1 plans, spawns T2 managers in parallel, each of which orchestrates T3 workers — all streaming back into a single final answer.

T1

Administrator

Analyzes complexity · Selects models · Decomposes into sections · Compiles final output

active
dispatches in parallel
T2 — Section 1
Auth module refactor
T2 — Section 2
Test generation
T2 — Section 3
Open pull request
spawn workers
T3
Read files
T3
Write JWT logic
T3
Run tests
T3
Write test file
T3
git commit & push

Complexity determines the tier count.

Cascade classifies your prompt before dispatching — simple questions go direct to a T3 worker, complex implementations spin up a full hierarchy.

Classification Example Route T2 Managers
Simple "What is a closure?"
T3
Moderate "Add pagination to the users API"
T2 T3 ×2
1
Complex "Refactor auth module to JWT, add tests, open PR"
T1 T2 ×3 T3 ×n
3–5
Highly Complex "Research, benchmark, and document the full auth ecosystem"
T1 T2 ×5+ T3 ×n
5+

Works with every major model provider

Anthropic Claude
OpenAI GPT
Google Gemini
Azure OpenAI
Ollama (local)
OpenAI-Compatible

Production-grade from day one.

No plugin store to browse. The tools your agents need are already wired in.

New · v0.6
📊

Live Benchmark Auto-Routing

Set any tier to Auto and Cascade picks the best-value model per task — fusing live public benchmark scores with live OpenRouter pricing. Cheap models win trivial work; frontier models win the hard parts.

New · v0.8
🤖

Autonomous Mode

Type /auto on for hands-off runs: the plan auto-approves and safe tools run without prompts — while dangerous tools still ask and budget caps stay the hard stop.

New · v0.7
📋

Boardroom Plan Review

Pause before any worker spawns to review T1's plan — an AI reviewer critiques it, you drop sections inline or add a steering note, and it re-plans until you approve.

New · v0.9
⏯️

Run Resumability

Hit the budget cap on a big task? /continue resumes with a raised budget — files already created persist on disk, so only the remaining work runs. No redo.

New · v0.10.1
👥

Workers Recruit Help

A worker that discovers its task should fan out asks its manager to spawn bounded sibling workers on the fly — dynamic parallelism, no rigid up-front plan, no runaway recursion.

Live Agent Tree

Watch the T1→T2→T3 hierarchy execute in real time directly in the terminal via ink rendering.

🔒

Permission Escalation

Dangerous tool calls escalate through T2 → T1 → user before executing. Never a silent file delete.

🔄

Provider Failover

Rate-limit hit? Cascade auto-switches providers with exponential backoff. Zero config required.

🛠️

Full Tool Suite

Shell, file CRUD, git, GitHub/GitLab PRs, Playwright browser automation, PDF creation, code interpreter.

🌐

Web Dashboard

React + ReactFlow live topology graph, session browser, cost tracker, JWT auth, WebSocket updates.

🔌

MCP Support

Connect any Model Context Protocol server. Its tools become available to every T3 worker automatically.

💰

Per-Tier Cost Breakdown & Budget Control

Every result exposes costByTier, tokensByTier, and percentage attribution. Set a live session budget with /budget set 0.50 — Cascade warns you at 80% spend (configurable via warnAtPct) and stops new tasks the moment the cap is hit, with no config-file edits required.

⌨️

Guided Setup Wizard

First-run TUI collects API keys for every provider — including multiple Azure deployments and custom OpenAI-compatible endpoints. Fetches live model lists, then assign T1/T2/T3 models or let Cascade Auto decide.

🖥️

Claude Code-Style CLI

Redesigned terminal UI with a top status bar showing live tier models and cost, a compact agent tree for T1→T2→T3 progress, and a keyboard hint bar — all purpose-built for Cascade's multi-tier hierarchy.

🎛️

Interactive Model Picker

Run /model inside the REPL for a three-step picker — provider → tier → model — with Auto at every step. Arrow keys, Tab, j/k and number keys all work; selections write .cascade/config.json and hot-swap the live router, no restart required.

Task Cancellation via AbortSignal

Pass an AbortSignal to cascade.run() to stop any in-progress run mid-flight. All active tiers (T1 → T2 → T3) halt at the next safe checkpoint before the next LLM call — no mid-stream interruptions, no orphaned agents. A run:cancelled event fires with partial output so you can still surface what was produced. Prevents runaway token spend on long tasks.

cascade.config.json
// .cascade/config.json
{
  "version": "1.0",
  "providers": [
    { "type": "anthropic",
      "apiKey": "sk-ant-..." },
    { "type": "ollama" }
  ],
  "models": {
    "t1": "claude-opus-4",
    "t2": "claude-sonnet-4",
    "t3": "llama3.2:3b"
  },
  "tools": {
    "shellBlocklist": ["rm -rf"],
    "requireApprovalFor": ["shell"]
  }
}

Embed in any
Node.js project.

Cascade exposes a first-class TypeScript SDK. Bring your own approval flow, stream tokens to any UI, or wire it into a CI pipeline.

Full TypeScript types for every option and result

Token-by-token streaming via callback

Custom approval callbacks for tool gating

Per-tier cost & token breakdown — costByTier, tokensByTier, and percentage attribution in every result

Live budget management — /budget set <$amount> caps session spend at runtime; /budget shows a visual spend bar; proactive warning fires at 80% (configurable warnAtPct) before the hard stop

runCascade, createCascade, streamCascade — three entry points

example.ts
import { streamCascade } from 'cascade-ai';

await streamCascade(
  'Refactor auth module to use JWT, add tests, open a PR',
  (token) => process.stdout.write(token),
  {
    workspacePath: '/my/project',
    approvalCallback: async (req) => {
      console.log(`Allow ${req.toolName}?`);
      return { approved: true, always: false };
    },
  }
);

One command away.

Open-source, MIT licensed. No telemetry by default. Runs local models via Ollama — your code never leaves your machine if you don't want it to.