Provider Harness
Build the app-side inference harness before adding Ando as a provider.
A provider harness is the part of your app that turns product intent into model requests. Build it once, then connect providers behind it. Ando should be one provider option inside that harness, not a one-off path scattered through the product.
The harness keeps your app clear when users bring their own inference. It decides what the app needs. Ando decides how a user-owned connection reaches approved models and records usage.
What the harness owns
System prompts, user messages, tool definitions, files, memory, and the workflow state that belongs to your app.
The selected provider, encrypted key reference, selected model, default limits, and health state.
Streaming UI, cancellation, retries, timeout policy, request IDs, and user-facing error states.
Connection status, key rotation, usage links, and budget guidance that make inference legible.
Provider shape
Keep the app interface provider-neutral. The user can select Ando now and you can add other providers later without changing the rest of the product.
type InferenceProviderId = "ando";
type UserInferenceProvider = {
id: InferenceProviderId;
label: "Ando";
baseURL: "https://inference.andoai.xyz/v1";
encryptedApiKeyRef: string;
selectedModel: string;
maxTokens: number;
streaming: boolean;
};Do not store the raw Virtual Key in the same row you use for app settings. Store it in a secret manager or encrypted credential table, then keep only the reference and a redacted preview in normal product data.
Request flow
The app shows Ando as an OpenAI-compatible provider and asks for the user's Ando Virtual Key.
The server calls GET /v1/models with that key and stores the provider only when the response succeeds.
The app converts product context into messages, tools, model, and output limits.
The server sends POST /v1/chat/completions. Usage stays attached to the user's Ando connection.
Server boundary
Hosted apps should proxy model calls through their own backend. The browser can ask your app for a completion, but your backend should attach the user's Ando Virtual Key and call Ando.
const ANDO_BASE_URL = "https://inference.andoai.xyz/v1";
export async function runChatWithAndo({
virtualKey,
model,
messages,
}: {
virtualKey: string;
model: string;
messages: Array<{ role: "system" | "user" | "assistant"; content: string }>;
}) {
const response = await fetch(`${ANDO_BASE_URL}/chat/completions`, {
method: "POST",
headers: {
Authorization: `Bearer ${virtualKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model,
messages,
max_tokens: 512,
temperature: 0.2,
}),
});
if (!response.ok) {
throw new Error(`Ando request failed: ${response.status}`);
}
return response.json();
}Self-hosted or local-only tools can keep the user's key in the local runtime when the user owns that machine. Shared production frontends should not put user Virtual Keys into client-side storage, analytics payloads, crash reports, or public logs.
Harness checklist
- Add Ando as a named provider, not as a hidden default.
- Make the base URL fixed and visible:
https://inference.andoai.xyz/v1. - Test the user's key with
GET /v1/modelsbefore enabling the provider. - Cache model choices lightly and refresh when the user retests the provider.
- Require a
max_tokensormax_completion_tokenslimit for every call. - Keep request IDs so support can trace failures without storing prompt text or raw credentials.
- Provide remove, rotate, and retest actions in the same settings area where the user added Ando.