[Infra] Timmy Model Fallback Chain — Hermes → DeepSeek #51

Closed
opened 2026-03-19 00:01:45 -04:00 by perplexity · 2 comments
Owner

Model Fallback Chain

Timmy needs a reliable fallback chain so he doesn't go silent when the primary model chokes or the cloud API throttles.

Stack (in priority order)

Priority Model Type Notes
1 Hermes 4.3 Local (Ollama) Primary — pending WiFi download
2 Hermes 3 Local (Ollama) Current default, modest Mac hardware
3 DeepSeek V3.2 Cloud API $0.28/M input, no rate limits, OpenAI-compatible

Why This Stack

  • Local-first: Hermes models run on Ollama, sovereign, no dependency on cloud
  • DeepSeek as cloud fallback: Cheapest frontier-quality API, explicitly no rate limits, OpenAI-compatible endpoint (one-line base URL swap)
  • No Anthropic dependency: Claude Max throttling is unreliable for agent uptime
  • No Codex: Decided against adding OpenAI Codex to the system (dev tool, not an agent slot)

Implementation

  1. Ollama config should attempt Hermes 4.3 → fall back to Hermes 3 if model not available
  2. Timmy adapter (or gateway) should detect local inference failure/timeout and route to DeepSeek V3.2 API
  3. DeepSeek API key stored as env var, endpoint is https://api.deepseek.com/v1 (OpenAI-compatible)
  4. Response format is identical — no downstream changes needed

Acceptance Criteria

  • Fallback chain config exists in Timmy's adapter/dashboard
  • Local timeout triggers cloud fallback automatically
  • Cloud fallback is transparent to The Matrix (same message format)
  • Logging shows which model actually served each response
## Model Fallback Chain Timmy needs a reliable fallback chain so he doesn't go silent when the primary model chokes or the cloud API throttles. ### Stack (in priority order) | Priority | Model | Type | Notes | |----------|-------|------|-------| | 1 | **Hermes 4.3** | Local (Ollama) | Primary — pending WiFi download | | 2 | **Hermes 3** | Local (Ollama) | Current default, modest Mac hardware | | 3 | **DeepSeek V3.2** | Cloud API | $0.28/M input, no rate limits, OpenAI-compatible | ### Why This Stack - **Local-first**: Hermes models run on Ollama, sovereign, no dependency on cloud - **DeepSeek as cloud fallback**: Cheapest frontier-quality API, explicitly no rate limits, OpenAI-compatible endpoint (one-line base URL swap) - **No Anthropic dependency**: Claude Max throttling is unreliable for agent uptime - **No Codex**: Decided against adding OpenAI Codex to the system (dev tool, not an agent slot) ### Implementation 1. Ollama config should attempt Hermes 4.3 → fall back to Hermes 3 if model not available 2. Timmy adapter (or gateway) should detect local inference failure/timeout and route to DeepSeek V3.2 API 3. DeepSeek API key stored as env var, endpoint is `https://api.deepseek.com/v1` (OpenAI-compatible) 4. Response format is identical — no downstream changes needed ### Acceptance Criteria - [ ] Fallback chain config exists in Timmy's adapter/dashboard - [ ] Local timeout triggers cloud fallback automatically - [ ] Cloud fallback is transparent to The Matrix (same message format) - [ ] Logging shows which model actually served each response
perplexity added the enhancementtimmy labels 2026-03-19 00:01:45 -04:00

[triage] Scope refinement:

  • Files: Timmy config/router (wherever model selection happens)
  • Acceptance: If primary model (Hermes/Ollama) fails or times out >10s, automatically try DeepSeek via Nous provider. Log fallback events.
  • Score: 7 (scope=2, acceptance=2, alignment=3 — Timmy goes silent without this)
[triage] Scope refinement: - **Files**: Timmy config/router (wherever model selection happens) - **Acceptance**: If primary model (Hermes/Ollama) fails or times out >10s, automatically try DeepSeek via Nous provider. Log fallback events. - **Score**: 7 (scope=2, acceptance=2, alignment=3 — Timmy goes silent without this)
Author
Owner

Resolved in PR #81 (feat/automation-sprint). All tests passing.

Resolved in PR #81 (feat/automation-sprint). All tests passing.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: perplexity/the-matrix#51