Discover
Open app

Provider Harness

Build the app-side inference harness before adding Ando as a provider.

A provider harness is the part of your app that turns product intent into model requests. Build it once, then connect providers behind it. Ando should be one provider option inside that harness, not a one-off path scattered through the product.

The harness keeps your app clear when users bring their own inference. It decides what the app needs. Ando decides how a user-owned connection reaches approved models and records usage.

What the harness owns

Product context

System prompts, user messages, tool definitions, files, memory, and the workflow state that belongs to your app.

Provider settings

The selected provider, encrypted key reference, selected model, default limits, and health state.

Runtime behavior

Streaming UI, cancellation, retries, timeout policy, request IDs, and user-facing error states.

Governance UX

Connection status, key rotation, usage links, and budget guidance that make inference legible.

Provider shape

Keep the app interface provider-neutral. The user can select Ando now and you can add other providers later without changing the rest of the product.

type InferenceProviderId = "ando";

type UserInferenceProvider = {
  id: InferenceProviderId;
  label: "Ando";
  baseURL: "https://inference.andoai.xyz/v1";
  encryptedApiKeyRef: string;
  selectedModel: string;
  maxTokens: number;
  streaming: boolean;
};

Do not store the raw Virtual Key in the same row you use for app settings. Store it in a secret manager or encrypted credential table, then keep only the reference and a redacted preview in normal product data.

Request flow

01User adds Ando

The app shows Ando as an OpenAI-compatible provider and asks for the user's Ando Virtual Key.

02App validates access

The server calls GET /v1/models with that key and stores the provider only when the response succeeds.

03Harness builds the request

The app converts product context into messages, tools, model, and output limits.

04Ando runs inference

The server sends POST /v1/chat/completions. Usage stays attached to the user's Ando connection.

Server boundary

Hosted apps should proxy model calls through their own backend. The browser can ask your app for a completion, but your backend should attach the user's Ando Virtual Key and call Ando.

const ANDO_BASE_URL = "https://inference.andoai.xyz/v1";

export async function runChatWithAndo({
  virtualKey,
  model,
  messages,
}: {
  virtualKey: string;
  model: string;
  messages: Array<{ role: "system" | "user" | "assistant"; content: string }>;
}) {
  const response = await fetch(`${ANDO_BASE_URL}/chat/completions`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${virtualKey}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model,
      messages,
      max_tokens: 512,
      temperature: 0.2,
    }),
  });

  if (!response.ok) {
    throw new Error(`Ando request failed: ${response.status}`);
  }

  return response.json();
}

Self-hosted or local-only tools can keep the user's key in the local runtime when the user owns that machine. Shared production frontends should not put user Virtual Keys into client-side storage, analytics payloads, crash reports, or public logs.

Harness checklist

  • Add Ando as a named provider, not as a hidden default.
  • Make the base URL fixed and visible: https://inference.andoai.xyz/v1.
  • Test the user's key with GET /v1/models before enabling the provider.
  • Cache model choices lightly and refresh when the user retests the provider.
  • Require a max_tokens or max_completion_tokens limit for every call.
  • Keep request IDs so support can trace failures without storing prompt text or raw credentials.
  • Provide remove, rotate, and retest actions in the same settings area where the user added Ando.

On this page