[Infra] Model fallback chain — resilient Ollama calls with model cascade #79

Closed
opened 2026-03-20 14:54:47 -04:00 by perplexity · 1 comment
Owner

Overview

Wrap all Ollama calls in a fallback chain so the system stays alive when the primary model is unavailable. No cloud APIs — local only.

Fallback Chain

Priority Model Notes
1 Configured model (e.g. hermes3) Primary
2 llama3.2 Common fallback, usually available
3 First available model from GET /api/tags Last resort

If all local models fail, return a graceful error instead of crashing.

Changes

server/ollama_client.py (NEW)

Resilient Ollama HTTP client:

  • OllamaClient(base_url, models=["hermes3", "llama3.2"])
  • async generate(prompt, system, json_mode) -> str
    • Tries each model in order
    • On connection error or model-not-found, tries next
    • Caches which model is available (refreshes every 5 min)
  • async list_models() -> list[str] — GET /api/tags
  • async health() -> bool — GET /api/version
  • Logs which model actually served each request

server/research.py

Replace raw httpx Ollama calls with OllamaClient.

server/bridge.py

Expose OllamaClient instance on the bridge for shared use.
Add system_status message on model fallback events.

Acceptance Criteria

  • Primary model failure cascades to fallback
  • Available model cached + refreshed periodically
  • Graceful degradation when Ollama is completely down
  • Logging shows which model served each request
  • No cloud API calls — local only per user decision
  • Tests with mocked model availability

Closes

  • #51 (Model Fallback Chain — Hermes → DeepSeek) — updated: local-only, no DeepSeek
## Overview Wrap all Ollama calls in a fallback chain so the system stays alive when the primary model is unavailable. No cloud APIs — local only. ## Fallback Chain | Priority | Model | Notes | |----------|-------|-------| | 1 | Configured model (e.g. `hermes3`) | Primary | | 2 | `llama3.2` | Common fallback, usually available | | 3 | First available model from `GET /api/tags` | Last resort | If all local models fail, return a graceful error instead of crashing. ## Changes ### `server/ollama_client.py` (NEW) Resilient Ollama HTTP client: - `OllamaClient(base_url, models=["hermes3", "llama3.2"])` - `async generate(prompt, system, json_mode) -> str` - Tries each model in order - On connection error or model-not-found, tries next - Caches which model is available (refreshes every 5 min) - `async list_models() -> list[str]` — GET /api/tags - `async health() -> bool` — GET /api/version - Logs which model actually served each request ### `server/research.py` Replace raw `httpx` Ollama calls with `OllamaClient`. ### `server/bridge.py` Expose `OllamaClient` instance on the bridge for shared use. Add `system_status` message on model fallback events. ## Acceptance Criteria - [ ] Primary model failure cascades to fallback - [ ] Available model cached + refreshed periodically - [ ] Graceful degradation when Ollama is completely down - [ ] Logging shows which model served each request - [ ] No cloud API calls — local only per user decision - [ ] Tests with mocked model availability ## Closes - #51 (Model Fallback Chain — Hermes → DeepSeek) — updated: local-only, no DeepSeek
Author
Owner

Resolved in PR #81 (feat/automation-sprint). All tests passing.

Resolved in PR #81 (feat/automation-sprint). All tests passing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: perplexity/the-matrix#79