Discover
Open app

Inference Tokens

Understand inference token spend and how it is calculated.

Inference tokens make model work visible. They show how much context an app sends, how much response comes back, and how that use fits within the boundary you choose.

What counts

Inference spend is based on request usage. Input tokens represent the prompt, conversation context, tool instructions, or other content sent to the model. Output tokens represent the response returned by the model.

InputPrompt and context

The app sends instructions, conversation history, tool context, or files that are converted into model input.

OutputDelivered response

The model returns generated text or structured output. Longer responses spend more output tokens.

RouteActive price

Ando applies the active pricing route for the model path used by the request.

ReportUsage trail

The spend is attached to the account, app, and connection so the usage trail can show where it came from.

How spend is calculated

Ando calculates spend from delivered model usage and the active pricing route for that model path. A request with more input context, a longer output, or a more expensive model route will spend more inference tokens.

Why spend changes

Spend can rise when an app sends long conversation history, includes large tool instructions, requests longer outputs, retries a flow, or uses a more expensive model route. Two requests with similar prompts can still spend differently if their output length or routed model path differs.

How to manage spend

Keep context deliberate, set clear boundaries, and review usage when an app's rhythm changes. Use inference optimization for everyday routing, then add account-level or connection-level caps where a closer boundary is needed.

Keep context deliberate

Send the history and tool detail the task needs, not everything the app has seen.

Cap long-running flows

Use connection-level caps for agents and tools that can loop or retry.

Review output length

Unexpectedly long responses can be a spend signal, even when prompts look small.

Use optimization

Let Ando keep everyday requests on a deliberate model path.

On this page