Inference Tokens
Understand inference token spend and how it is calculated.
Inference tokens make model work visible. They show how much context an app sends, how much response comes back, and how that use fits within the boundary you choose.
What counts
Inference spend is based on request usage. Input tokens represent the prompt, conversation context, tool instructions, or other content sent to the model. Output tokens represent the response returned by the model.
The app sends instructions, conversation history, tool context, or files that are converted into model input.
The model returns generated text or structured output. Longer responses spend more output tokens.
Ando applies the active pricing route for the model path used by the request.
The spend is attached to the account, app, and connection so the usage trail can show where it came from.
How spend is calculated
Ando calculates spend from delivered model usage and the active pricing route for that model path. A request with more input context, a longer output, or a more expensive model route will spend more inference tokens.
Why spend changes
Spend can rise when an app sends long conversation history, includes large tool instructions, requests longer outputs, retries a flow, or uses a more expensive model route. Two requests with similar prompts can still spend differently if their output length or routed model path differs.
How to manage spend
Keep context deliberate, set clear boundaries, and review usage when an app's rhythm changes. Use inference optimization for everyday routing, then add account-level or connection-level caps where a closer boundary is needed.
Send the history and tool detail the task needs, not everything the app has seen.
Use connection-level caps for agents and tools that can loop or retry.
Unexpectedly long responses can be a spend signal, even when prompts look small.
Let Ando keep everyday requests on a deliberate model path.