Baseten
LiteLLM supports both Baseten Model APIs and dedicated deployments with automatic routing.
API Typesโ
Model API (Default)โ
- URL: https://inference.baseten.co/v1
- Format: baseten/<model-name>(e.g.,baseten/openai/gpt-oss-120b)
- Best for: Quick access to popular models
Dedicated Deploymentsโ
- URL: https://model-{id}.api.baseten.co/environments/production/sync/v1
- Format: baseten/{8-digit-alphanumeric-code}(e.g.,baseten/abcd1234)
- Best for: Custom models, latency SLAs
tip
Automatic Routing: LiteLLM detects the type based on model format:
- 8-digit alphanumeric codes โ Dedicated deployment
- All other formats โ Model API
Quick Startโ
import os
from litellm import completion
os.environ['BASETEN_API_KEY'] = "your-api-key"
# Model API (default)
response = completion(
    model="baseten/openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}]
)
# Dedicated deployment (8-digit ID)
response = completion(
    model="baseten/abcd1234",
    messages=[{"role": "user", "content": "Hello!"}]
)
Examplesโ
Basic Usageโ
# Model API
response = completion(
    model="baseten/openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    max_tokens=500,
    temperature=0.7
)
# Dedicated deployment
response = completion(
    model="baseten/abcd1234",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    max_tokens=500,
    temperature=0.7
)
Streaming (Model API only)โ
response = completion(
    model="baseten/openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True,
    stream_options={"include_usage": True}
)
for chunk in response:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Usage with LiteLLM Proxyโ
- Config:
model_list:
  - model_name: baseten-model
    litellm_params:
      model: baseten/openai/gpt-oss-120b
      api_key: your-baseten-api-key
- Request:
import openai
client = openai.OpenAI(
    api_key="sk-1234",
    base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
    model="baseten-model",
    messages=[{"role": "user", "content": "Hello!"}]
)