Whitepaper · v0.1 · June 2026
A routing layer for open inference
One OpenAI-compatible API that scores every provider on cost, latency, and reliability, routes each request to the cheapest healthy one, and fails over automatically — so inference takes the path of least resistance.
Open-model inference is sold by dozens of providers at prices that differ by multiples for the identical model, and that move week to week. Developers integrate one provider, inherit its price and its outages, and rarely re-shop. Current is a thin routing layer in front of the whole market: a single OpenAI-compatible endpoint that continuously scores providers and dispatches each request to the best one, with transparent pricing and automatic failover. This document describes the problem, the architecture, the routing score, the economics, and the path toward an open routing network.
01A fragmented, opaque market
The same open model — say a 70B-class chat model — is served today by many providers, each with its own price, latency profile, rate limits, and reliability. Prices for the identical model routinely differ by 2–5× across providers, and they change as capacity and competition shift. There is no single “market price” for a token; there is a spread.
Yet the typical integration is static: a developer wires their SDK to one provider, hard-codes a base URL and a key, and ships. From that moment they pay that provider’s price and absorb that provider’s outages and rate limits — even when a cheaper, healthier provider is serving the same model a few milliseconds away. Re-shopping means code changes, new accounts, new keys, and new failure modes, so almost nobody does it. The spread is real money left on the table, every request.
02Current in one line
Current is one OpenAI-compatible endpoint that scores every provider on cost, latency, and reliability, routes each request to the best one, and fails over automatically — one API, one bill, the cheapest route. You change a base URL; everything downstream is unchanged.
The design goal is to be invisible infrastructure: drop-in compatibility, transparent decisions (never a black box), and a price that is always the live market floor plus a small, fixed routing fee.
03Architecture
Current is a stateless gateway in front of a live registry and a routing engine.
- OpenAI-compatible gatewayAccepts standard /v1 chat, completions, and embeddings requests. A request names a model (or "auto"); the gateway resolves it to a concrete provider deployment. Existing OpenAI SDKs work by changing only base_url and key.
- Capability registryA continuously-updated map of which providers serve which models, at what price, with what context window and features — plus live health and latency observations. This is the menu the router chooses from.
- Routing engineFor each request it computes a score per eligible provider (§04) and dispatches to the lowest (best). The full breakdown is exposed on the response, so any decision can be audited.
- Failover (the eddy)If a provider rate-limits, errors, or stalls before the stream begins, the engine transparently retries the next-best provider. The blip never reaches the caller.
- Metering & billingUsage is metered per request in micro-dollars against a prepaid balance. Cost is exact, not estimated; the balance hard-stops at zero.
04The routing score
Every eligible provider is scored on three normalized axes — cost, latency, and reliability — combined by tunable weights. The lowest score wins:
score(p) = w_cost · price(p) / price_ref
+ w_lat · latency(p) / latency_ref
+ w_rel · (1 − reliability(p))
route → argmin_p score(p) # lowest cost-of-choice winsWeights default to cost-first but are tunable per API key or per request, so a latency-sensitive workload can pay for speed while a batch job optimizes purely for price. Because the breakdown is returned with each response, routing is explainable rather than magic — you can always see why a provider was chosen and what the runner-up would have cost.
05Economics
You pay the provider’s live price plus a flat routing fee per million tokens — no subscriptions, no per-model markup. Because Current always routes to the cheapest healthy channel and the spread between providers is wide, the routed price is typically well below a fixed single-provider integration, often by 30% or more, net of the fee.
Balances are prepaid and topped up with card or stablecoin; spend is exact and capped by the balance. The fee funds the routing layer itself — the incentive is aligned with finding you the cheapest route, not with steering volume to any particular provider.
06Reliability & failover
Provider health is observed continuously and folded into the score, so a degrading provider is deprioritized before it fails. When a dispatch does fail pre-stream — a rate limit, a 5xx, a timeout — the engine reroutes to the next-best provider within the same request. Uptime is the product: the caller sees one reliable endpoint, not the churn behind it.
07Privacy & security
API keys are hashed at rest and shown only once. Current logs identifiers, latency, and cost — the data needed to route and bill — and not prompt or completion content. Compatibility is drop-in, so secrets and data paths stay under your control; only the base URL changes.
08Toward a routing network
The protocol grows in phases, from an aggregator today to an open routing layer:
- Phase 1 — One channel (shipping)A single OpenAI-compatible endpoint over every provider, a live capability registry, and routing you can see through.
- Phase 2 — Routing that learnsContinuous cost optimization, failover that reroutes around real provider health, and per-request analytics.
- Phase 3 — Open the networkBring your own provider, list community providers, and earn a share of the traffic you carry.
- Phase 4 — The network layerThe default routing layer for open-model inference — decentralized onboarding and on-chain reputation for every channel.
09Conclusion
Inference is becoming a commodity, and commodities are won by the layer that connects demand to the cheapest reliable supply — not by owning the supply. Current is that layer: one endpoint, transparent routing, the live price floor, and automatic failover. Integrate once; let every request take the path of least resistance.