[Infra] Model fallback chain — resilient Ollama calls with model cascade #79

New Issue

perplexity · 2026-03-20T14:54:47-04:00

perplexity commented

2026-03-20 14:54:47 -04:00

Overview

Wrap all Ollama calls in a fallback chain so the system stays alive when the primary model is unavailable. No cloud APIs — local only.

Fallback Chain

Priority	Model	Notes
1	Configured model (e.g. `hermes3`)	Primary
2	`llama3.2`	Common fallback, usually available
3	First available model from `GET /api/tags`	Last resort

If all local models fail, return a graceful error instead of crashing.

Changes

`server/ollama_client.py` (NEW)

Resilient Ollama HTTP client:

OllamaClient(base_url, models=["hermes3", "llama3.2"])
async generate(prompt, system, json_mode) -> str
- Tries each model in order
- On connection error or model-not-found, tries next
- Caches which model is available (refreshes every 5 min)
async list_models() -> list[str] — GET /api/tags
async health() -> bool — GET /api/version
Logs which model actually served each request

`server/research.py`

Replace raw httpx Ollama calls with OllamaClient.

`server/bridge.py`

Expose OllamaClient instance on the bridge for shared use.
Add system_status message on model fallback events.

Acceptance Criteria

Primary model failure cascades to fallback
Available model cached + refreshed periodically
Graceful degradation when Ollama is completely down
Logging shows which model served each request
No cloud API calls — local only per user decision
Tests with mocked model availability

Closes

#51 (Model Fallback Chain — Hermes → DeepSeek) — updated: local-only, no DeepSeek

## Overview Wrap all Ollama calls in a fallback chain so the system stays alive when the primary model is unavailable. No cloud APIs — local only. ## Fallback Chain | Priority | Model | Notes | |----------|-------|-------| | 1 | Configured model (e.g. `hermes3`) | Primary | | 2 | `llama3.2` | Common fallback, usually available | | 3 | First available model from `GET /api/tags` | Last resort | If all local models fail, return a graceful error instead of crashing. ## Changes ### `server/ollama_client.py` (NEW) Resilient Ollama HTTP client: - `OllamaClient(base_url, models=["hermes3", "llama3.2"])` - `async generate(prompt, system, json_mode) -> str` - Tries each model in order - On connection error or model-not-found, tries next - Caches which model is available (refreshes every 5 min) - `async list_models() -> list[str]` — GET /api/tags - `async health() -> bool` — GET /api/version - Logs which model actually served each request ### `server/research.py` Replace raw `httpx` Ollama calls with `OllamaClient`. ### `server/bridge.py` Expose `OllamaClient` instance on the bridge for shared use. Add `system_status` message on model fallback events. ## Acceptance Criteria - [ ] Primary model failure cascades to fallback - [ ] Available model cached + refreshed periodically - [ ] Graceful degradation when Ollama is completely down - [ ] Logging shows which model served each request - [ ] No cloud API calls — local only per user decision - [ ] Tests with mocked model availability ## Closes - #51 (Model Fallback Chain — Hermes → DeepSeek) — updated: local-only, no DeepSeek

perplexity referenced this issue from a commit

2026-03-20 15:05:34 -04:00

feat: Automation sprint — webhooks, auto-research, model fallback, SensoryBus (#77, #78, #79, #80)

perplexity referenced this issue

2026-03-20 15:05:50 -04:00

feat: Automation sprint — webhooks, auto-research, model fallback, SensoryBus (#77, #78, #79, #80) #81

perplexity referenced this issue from a commit

2026-03-20 15:06:01 -04:00

Merge pull request 'feat: Automation sprint — webhooks, auto-research, model fallback, SensoryBus (#77, #78, #79, #80)' (#81) from feat/automation-sprint into main

perplexity commented

2026-03-20 15:06:50 -04:00

Resolved in PR #81 (feat/automation-sprint). All tests passing.