Effortless benchmarking
while you build.

You’re too busy to write manual evaluation scripts. Point your baseURL to Checkstack and let it shadow-test 160+ models in the background while you build. Find your perfect model without the "Ugh" setup.

$ npx checkstack start

Includes $1 in free shadow-testing credits.

164+
Models
0%
API Markup
< 30s
Setup Time
Per Call
Cost Tracking
Field-Level
Accuracy

Stop writing eval scripts. Just change your baseURL.

Spin up the Checkstack sidecar locally, point your SDK to port 3456, and let our engine automatically grade cheaper models in the background while you test your app.

01

Start the proxy

Auth happens automatically. Shadow models receive traffic immediately.

$ npx checkstack run
02

Point your SDK at Checkstack

One line change. Checkstack proxies every call transparently.

baseURL: "http://localhost:3456/v1"
03

Use your app normally

Every prompt is silently replayed against the models you choose. An AI judge scores each response.

> POST /api/chat → 200 OK
+ [claude-3.5-sonnet] get_weather({...})
gemini-flash ✓ match $0.0002
claude-haiku ✓ match $0.0014
gpt-4o-mini ✓ match $0.0003
> POST /api/chat → 200 OK
+ [claude-3.5-sonnet] search_places({...})
gemini-flash ✓ match $0.0004
04

Stop and compare

Hit Ctrl+C. Get a cost and accuracy breakdown instantly. Switch to the winner.

Session complete · 4 calls
MODEL
MATCH
COST/10k
claude-3.5-sonnet *
100%
$234
gemini-2.0-flash
100%
$8
claude-haiku
75%
$62
💡 Switch to gemini-flash → save $226 per 10k runs
05

Rerun to verify

Optional

Not convinced? Re-run the exact same traffic with multiple repeats to check consistency before you commit.

$ checkstack rerun chk_9a2f18x --repeat 3
Re-running 4 calls × 3 repeats...
get_weather
✓ 3/3
search_places
✓ 3/3
get_tee_times
✓ 3/3
book_tee_time
✓ 3/3
Consistency: 12/12 (100%) across 3 runs
06

Optimize your prompt

Optional

If a cheaper model almost passes, Checkstack can suggest prompt tweaks to close the gap - without changing your logic.

$ checkstack rerun chk_9a2f18x --suggest
Analyzing 1 failed call...
⚠ search_places: haiku returns 2 results vs 3
💡 Suggested: add "Return at least 3 results" to system prompt
Estimated match rate: 75%100%
✓ Ready: Run --apply to update codebase
Why Checkstack

How we compare.

There's nothing else quite like Checkstack. It's the only tool that brings real-time parallel shadow testing to your local dev environment, letting you instantly test against 160+ models with zero API key setup.

Real-Time Testing
Live Shadow on Every Call
No: Offline Eval Suites
No: Logging / A/B Routing
No: Static Batch Runs
Models Available
160+ via Checkstack Proxy, No Keys
Bring Your Own Keys
Bring Your Own Keys
Bring Your Own Keys
Integration
1-Line baseURL Swap
Custom SDK + Tracing Setup
1-Line baseURL Swap
Config Files Required
Test Data
Real App Traffic
Prod Traces / Datasets
Production Logs
Hardcoded Test Cases
Prod Risk
Zero: Pre-Prod Only
Medium: SDK in Prod
High: Inline Proxy
Zero: Static Only
Pricing
$15/mo + At-Cost Usage
Free Tier, Then $39–249/mo
Usage-Based ($79/mo+)
Free (Funded by OpenAI)

Checkstack vs LangSmith or Braintrust

  • Real-time model comparison on every request vs running offline eval suites as a separate step
  • One env var change vs custom SDK integration and tracing instrumentation
  • Tests against your real local dev traffic, no dataset curation needed
  • Runs pre-production so broken prompts never reach users

Checkstack vs Helicone or Portkey

  • Parallel shadow testing across models vs single-model A/B routing or logging only
  • No production proxy in the critical path, zero risk of gateway downtime
  • $15/mo platform fee vs usage-based pricing starting at $79/mo+
  • 160+ models through the Vercel AI Gateway with no API keys to manage

Checkstack vs Promptfoo

  • Real-time shadow testing on live traffic vs batch runs against static YAML test cases
  • Automatic schema-aware field-level accuracy scoring with no eval config to write
  • Independent and vendor-neutral (Promptfoo is funded by OpenAI)
  • No API keys needed: proxies 160+ models through the Vercel AI Gateway

How Shadow Testing Works

  • 1.Point your local dev environment at the Checkstack proxy
  • 2.Every AI call is forwarded to your primary model and shadowed to cheaper alternatives
  • 3.Get field-level accuracy scores, latency, and cost comparisons in your terminal
Pricing

Zero markup. You pay what the compute costs.

Checkstack passes the exact token cost to you, to the eighth decimal. No hidden fees, no per-seat pricing, no surprises.

The $1 Bet

Sign up and get $1.00 of real API compute free. Find a cheaper model and save, or confirm your stack is already optimal. You win either way.

Claim Your $1 →
Checkstack Pro
$15/ month

14-day free trial. No credit card required.

  • CLI proxy for live shadow testing
  • Real-time accuracy dashboard
  • 164+ models across 24 providers
  • AI-graded accuracy with field-level scores
  • Exportable production middleware
  • $1 of evaluation credit included with trial
Start Free Trial →
Evaluation Credit
$0markup

Pay-as-you-go. Top up when you need to.

  • Purchase credit as a one-time top-up
  • Exact token cost, zero markup, ever
  • Top-up options: $10, $25, or $50
  • Credit expires 1 year after purchase
  • Covers thousands of evals on Eco-tier
  • Unused credit carries forward
$10
$25
$50

Available after signing up.

Model Catalog

Test any of 164 models across 24 providers.

The benchmarks compare 6 popular models. When you run your own evals you can pick any model from the full catalog. Compare frontier models, open-source options, and cost-optimized choices side-by-side on your exact data.

OpenAI34 models total
GPT-3.5 TurboGPT-4oGPT-4o MiniGPT-4.1GPT-4.1 MiniGPT-4.1 NanoGPT-5 MiniGPT-5 Nanoo1o3o4-mini+23 more

We benchmarked 50 of these models so you don't have to.

Accuracy, latency, and cost across JSON extraction, RAG audit, tool calling, and more.

View Benchmark →

Stop guessing.
Start measuring.

Sign up, run your task across 164+ models, and know exactly what you're paying for - and whether it's worth it.

Claim Your $1 and Start Free →

Already have an account? Sign in