API Reference

OpenAI-compatible inference API. Powered by IBM Granite and Google Gemma on dedicated hardware.

Contents

Authentication Base URL POST /v1/chat/completions Available Models Request Parameters Response Format Streaming (SSE)Rate Limits Error Codes Data Privacy SDK Compatibility

Authentication

All API requests require an API key passed via the Authorization header. Generate keys at /account/keys after signing in.

Authorization: Bearer sr_your_key_here

Base URL

All endpoints are served from a single base URL. Use this as your OpenAI SDK base_url.

https://streamrift.ai/api/v1

POST /v1/chat/completions

Create a chat completion. Fully compatible with the OpenAI chat completions API format. Supports streaming (SSE) and non-streaming responses.

curl https://streamrift.ai/api/v1/chat/completions \
  -H "Authorization: Bearer sr_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "streamrift-fast",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ],
    "stream": true,
    "temperature": 0.7,
    "max_tokens": 2048
  }'

Available Models

model	base	context	speed	use
streamrift-fast	IBM Granite 3.3 (2.5B)	16K	Fastest	Quick tasks, code completion, chat
streamrift-thinking	Google Gemma 4 (4B)	8K–128K	Balanced	Reasoning, analysis, creative writing

Request Parameters

modelstring *"streamrift-fast" or "streamrift-thinking"

messagesarray *Array of {role, content} objects. Roles: "system", "user", "assistant"

streambooleanEnable SSE streaming. Default: true

temperaturenumber0.0–2.0. Default: 0.7

top_pnumberNucleus sampling. Default: 0.9

max_tokensnumberMaximum tokens to generate. Capped by model context.

Response Format

Non-streaming responses return a standard chat completion object.

{
  "id": "sr-1713367200000",
  "object": "chat.completion",
  "model": "streamrift-fast",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Streaming (SSE)

When stream: true, responses are delivered as Server-Sent Events. Each event contains a delta with partial content.

data: {"id":"sr-1713367200001","object":"chat.completion.chunk","model":"streamrift-fast","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"sr-1713367200002","object":"chat.completion.chunk","model":"streamrift-fast","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}

data: [DONE]

Rate Limits

Rate limits are per-account and determined by your plan. Exceeding the limit returns HTTP 429 with a Retry-After header.

model	base	context	speed	use
Free	10 rpm	500K tokens/mo	1 concurrent	$0
Sovereign	60 rpm	10M tokens/mo	3 concurrent	$20/mo

Error Codes

Errors return a typed envelope so clients can branch on error.type without parsing message text.

401auth_errorMissing or invalid API key.

403api_disabledAPI access requires a paid plan. Free tier is chat-UI only.

403tier_not_availableThe thinking tier is not available on the current plan.

429rate_limit_exceededRequests-per-minute bucket drained. Check Retry-After header.

429wallet_emptyToken wallet below the 500-token floor required to start a turn. Wallet refills at 1,000/24h up to a 10,000 ceiling.

502fleet_errorBackend request failed mid-flight.

503fleet_unavailableNo healthy backends in the pool.

Data Privacy

StreamRift does not log, store, or retain the content of API requests or responses. Your prompts and completions exist only in transit between your client and the inference fleet. We track only usage metadata for billing: token counts, timestamps, and model selection. See our Terms of Service for full details.

SDK Compatibility

StreamRift is compatible with any OpenAI SDK client. Change the base_url parameter and use your StreamRift API key.

# Python
from openai import OpenAI
client = OpenAI(base_url="https://streamrift.ai/api/v1", api_key="sr_...")

# Node.js
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://streamrift.ai/api/v1", apiKey: "sr_..." });

# curl
curl https://streamrift.ai/api/v1/chat/completions -H "Authorization: Bearer sr_..."