v1.78.5-stable - Native OCR Support

October 18, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.78.5-stable

pip install litellm
pip install litellm==1.78.5

Key Highlights

Native OCR Endpoints - Native /v1/ocr endpoint support with cost tracking for Mistral OCR and Azure AI OCR
Global Vendor Discounts - Specify global vendor discount percentages for accurate cost tracking and reporting
Team Spending Reports - Team admins can now export detailed spending reports for their teams
Claude Haiku 4.5 - Day 0 support for Claude Haiku 4.5 across Bedrock, Vertex AI, and OpenRouter with 200K context window
GPT-5-Codex - Support for GPT-5-Codex via Responses API on OpenAI and Azure
Performance Improvements - Major router optimizations: O(1) model lookups, 10-100x faster shallow copy, 30-40% faster timing calls, and O(n) to O(1) hash generation

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Anthropic	`claude-haiku-4-5`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Anthropic	`claude-haiku-4-5-20251001`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Bedrock	`anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`global.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`jp.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (JP Cross-Region)
Bedrock	`us.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (US region)
Bedrock	`eu.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (EU region)
Bedrock	`apac.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (APAC region)
Bedrock	`au.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (AU region)
Vertex AI	`vertex_ai/claude-haiku-4-5@20251001`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
OpenAI	`gpt-5`	272K	$1.25	$10.00	Chat, responses API, reasoning, vision, function calling, prompt caching
OpenAI	`gpt-5-codex`	272K	$1.25	$10.00	Responses API mode
Azure	`azure/gpt-5-codex`	272K	$1.25	$10.00	Responses API mode
Gemini	`gemini-2.5-flash-image`	32K	$0.30	$2.50	Image generation (GA - Nano Banana) - $0.039/image
ZhipuAI	`glm-4.6`	-	-	-	Chat completions

Features

OpenAI
- GPT-5 return reasoning content via /chat/completions + GPT-5-Codex working on Claude Code - PR #15441
Anthropic
- Reduce claude-4-sonnet max_output_tokens to 64k - PR #15409
- Added claude-haiku-4.5 - PR #15579
- Add support for thinking blocks and redacted thinking blocks in Anthropic v1/messages API - PR #15501
Bedrock
- Add anthropic.claude-haiku-4-5-20251001-v1:0 on Bedrock, VertexAI - PR #15581
- Add Claude Haiku 4.5 support for Bedrock global and US regions - PR #15650
- Add Claude Haiku 4.5 support for Bedrock Other regions - PR #15653
- Add JP Cross-Region Inference jp.anthropic.claude-haiku-4-5-20251001 - PR #15598
- Fix: bedrock-pricing-geo-inregion-cross-region / add Global Cross-Region Inference - PR #15685
- Fix: Support us-gov prefix for AWS GovCloud Bedrock models - PR #15626
- Fix GPT-OSS in Bedrock now supports streaming. Revert fake streaming - PR #15668
Gemini
- Feat(pricing): Add Gemini 2.5 Flash Image (Nano Banana) in GA - PR #15557
- Fix: Gemini 2.5 Flash Image should not have supports_web_search=true - PR #15642
- Remove penalty params as supported params for gemini preview model - PR #15503
Ollama
- Fix(ollama/chat): correctly map reasoning_effort to think in requests - PR #15465
OpenRouter
- Add anthropic/claude-sonnet-4.5 to OpenRouter cost map - PR #15472
- Prompt caching for anthropic models with OpenRouter - PR #15535
- Get completion cost directly from OpenRouter - PR #15448
- Fix OpenRouter Claude Opus 4 model naming - PR #15495
CometAPI
- Fix(cometapi): improve CometAPI provider support (embeddings, image generation, docs) - PR #15591
Lemonade
- Adding new models to the lemonade provider - PR #15554
Watson X
- Fix (pricing): Fix pricing for watsonx model family for various models - PR #15670
Vercel AI Gateway
- Add glm-4.6 model to pricing configuration - PR #15679
Vertex AI
- Add Vertex AI Discovery Engine Rerank Support - PR #15532

Bug Fixes

Anthropic
- Fix: Pricing for Claude Sonnet 4.5 in US regions is 10x too high - PR #15374
OpenRouter
- Change gpt-5-codex support in model_price json - PR #15540
Bedrock
- Fix filtering headers for signature calcs - PR #15590
General
- Add native reasoning and streaming support flag for gpt-5-codex - PR #15569

LLM API Endpoints

Features

Responses API
- Responses API - enable calling anthropic/gemini models in Responses API streaming in openai ruby sdk + DB - sanity check pending migrations before startup - PR #15432
- Add support for responses mode in health check - PR #15658
OCR API
- Feat: Add native litellm.ocr() functions - PR #15567
- Feat: Add /ocr route on LiteLLM AI Gateway - Adds support for native Mistral OCR calling - PR #15571
- Feat: Add Azure AI Mistral OCR Integration - PR #15572
- Feat: Native /ocr endpoint support - PR #15573
- Feat: Add Cost Tracking for /ocr endpoints - PR #15678
/generateContent
- Fix: GEMINI - CLI - add google_routes to llm_api_routes - PR #15500
- Fix Pydantic validation error for citationMetadata.citationSources in Google GenAI responses - PR #15592
Images API
- Fix: Dall-e-2 for Image Edits API - PR #15604
Bedrock Passthrough
- Feat: Allow calling /invoke, /converse routes through AI Gateway + models on config.yaml - PR #15618

Bugs

General
- Fix: Convert object to a correct type - PR #15634
- Bug Fix: Tags as metadata dicts were raising exceptions - PR #15625
- Add type hint to function_to_dict and fix typo - PR #15580

Management Endpoints / UI

Features

Virtual Keys
- Docs: Key Rotations - PR #15455
- Fix: UI - Key Max Budget Removal Error Fix - PR #15672
- litellm_Key Settings Max Budget Removal Error Fix - PR #15669
Teams
- Feat: Allow Team Admins to export a report of the team spending - PR #15542
Passthrough
- Feat: Passthrough - allow admin to give access to specific passthrough endpoints - PR #15401
SCIM v2
- Feat(scim_v2.py): if group.id doesn't exist, use external id + Passthrough - ensure updates and deletions persist across instances - PR #15276
SSO
- Feat: UI SSO - Add PKCE for OKTA SSO - PR #15608
- Fix: Separate OAuth M2M authentication from UI SSO + Handle Introspection endpoint for Oauth2 - PR #15667
- Fix/entraid app roles jwt claim clean - PR #15583

Logging / Guardrail / Prompt Management Integrations

Guardrails

General
- Fix apply_guardrail endpoint returning raw string instead of ApplyGuardrailResponse - PR #15436
- Fix: Ensure guardrail memory sync after database updates - PR #15633
- Feat: add guardrail for image generation - PR #15619
- Feat: Add Guardrails for /v1/messages and /v1/responses API - PR #15686
Pillar Security
- Feature: update pillar security integration to support no persistence mode in litellm proxy - PR #15599

Prompt Management

General
- Small fix code snippet custom_prompt_management.md - PR #15544

Spend Tracking, Budgets and Rate Limiting

Cost Tracking
- Feat: Cost Tracking - specify a global vendor discount for costs - PR #15546
- Feat: UI - Allow setting Provider Discounts on UI - PR #15550
Budgets
- Fix: improve budget clarity - PR #15682

Performance / Loadbalancing / Reliability improvements

Router Optimizations
- Perf(router): use shallow copy instead of deepcopy for model aliases - 10-100x faster than deepcopy on nested dict structures - PR #15576
- Perf(router): optimize string concatenation in hash generation - Improves time complexity from O(n²) to O(n) - PR #15575
- Perf(router): optimize model lookups with O(1) data structures - Replace O(n) scans with index map lookups - PR #15578
- Perf(router): optimize model lookups with O(1) index maps - Use model_id_to_deployment_index_map and model_name_to_deployment_indices for instant lookups - PR #15574
- Perf(router): optimize timing functions in completion hot path - Use time.perf_counter() for duration measurements and time.monotonic() for timeout calculations, providing 30-40% faster timing calls - PR #15617
SSL/TLS Performance
- Feat(ssl): add configurable ECDH curve for TLS performance - Configure via ssl_ecdh_curve setting to disable PQC on OpenSSL 3.x for better performance - PR #15617
Token Counter
- Fix(token-counter): extract model_info from deployment for custom_tokenizer - PR #15680
Performance Metrics
- Add: perf summary - PR #15458
CI/CD
- Fix: CI/CD - Missing env key & Linter type error - PR #15606

Documentation Updates

Provider Documentation
- Litellm docs 10 11 2025 - PR #15457
- Docs: add ecs deployment guide - PR #15468
- Docs: Update benchmark results - PR #15461
- Fix: add missing context to benchmark docs - PR #15688
General
- Fixed a few typos - PR #15267

New Contributors

@jlan-nl made their first contribution in PR #15374
@ImadSaddik made their first contribution in PR #15267
@huangyafei made their first contribution in PR #15472
@mubashir1osmani made their first contribution in PR #15468
@kowyo made their first contribution in PR #15465
@dhruvyad made their first contribution in PR #15448
@davizucon made their first contribution in PR #15544
@FelipeRodriguesGare made their first contribution in PR #15540
@ndrsfel made their first contribution in PR #15557
@shinharaguchi made their first contribution in PR #15598
@TensorNull made their first contribution in PR #15591
@TeddyAmkie made their first contribution in PR #15583
@aniketmaurya made their first contribution in PR #15580
@eddierichter-amd made their first contribution in PR #15554
@konekohana made their first contribution in PR #15535
@Classic298 made their first contribution in PR #15495
@afogel made their first contribution in PR #15599
@orolega made their first contribution in PR #15633
@LucasSugi made their first contribution in PR #15634
@uc4w6c made their first contribution in PR #15619
@Sameerlite made their first contribution in PR #15658
@yuneng-jiang made their first contribution in PR #15672
@Nikro made their first contribution in PR #15680

Full Changelog

View complete changelog on GitHub

Deploy this version​

Key Highlights​

New Models / Updated Models​

New Model Support​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Logging / Guardrail / Prompt Management Integrations​

Guardrails​

Prompt Management​

Spend Tracking, Budgets and Rate Limiting​

Performance / Loadbalancing / Reliability improvements​

Documentation Updates​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

New Models / Updated Models

New Model Support

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Logging / Guardrail / Prompt Management Integrations

Guardrails

Prompt Management

Spend Tracking, Budgets and Rate Limiting

Performance / Loadbalancing / Reliability improvements

Documentation Updates

New Contributors

Full Changelog