Files
Timmy-time-dashboard/TEST_COVERAGE_ANALYSIS.md

8.9 KiB
Raw Permalink Blame History

Test Coverage Analysis — Timmy Time Dashboard

Date: 2026-03-06 Overall coverage: 63.6% (7,996 statements, 2,910 missed) Threshold: 60% (passes, but barely) Test suite: 914 passed, 4 failed, 39 skipped, 5 errors — 35 seconds


Current Coverage by Package

Package Approx. Coverage Notes
spark/ 9098% Best-covered package
timmy_serve/ 80100% Small package, well tested
infrastructure/models/ 4297% registry great, multimodal weak
dashboard/middleware/ 79100% Solid
dashboard/routes/ 36100% Highly uneven — some routes untested
integrations/ 51100% Paperclip well covered; Discord weak
timmy/ 0100% Several core modules at 0%
brain/ 075% client and worker very low
infrastructure/events/ 0% Completely untested
infrastructure/error_capture.py 0% Completely untested

Priority 1 — Zero-Coverage Modules (0%)

These modules have no test coverage at all and represent the biggest risk:

Module Stmts Purpose
src/timmy/semantic_memory.py 187 Semantic memory system — core agent feature
src/timmy/agents/timmy.py 165 Main Timmy agent class
src/timmy/agents/base.py 57 Base agent class
src/timmy/interview.py 46 Interview flow
src/infrastructure/error_capture.py 91 Error capture/reporting
src/infrastructure/events/broadcaster.py 67 Event broadcasting
src/infrastructure/events/bus.py 74 Event bus
src/infrastructure/openfang/tools.py 41 OpenFang tool definitions
src/brain/schema.py 14 Brain schema definitions

Recommendation: timmy/agents/timmy.py (165 stmts) and semantic_memory.py (187 stmts) are the highest-value targets. The events subsystem (broadcaster.py + bus.py = 141 stmts) is critical infrastructure with zero tests.


Priority 2 — Under-Tested Modules (<50%)

Module Cover Stmts Missed Purpose
brain/client.py 14.8% 127 Brain client — primary brain interface
brain/worker.py 16.1% 156 Background brain worker
brain/embeddings.py 35.0% 26 Embedding generation
timmy/approvals.py 39.1% 42 Approval workflow
dashboard/routes/marketplace.py 36.4% 21 Marketplace routes
dashboard/routes/paperclip.py 41.1% 96 Paperclip dashboard routes
infrastructure/hands/tools.py 41.3% 27 Tool execution
infrastructure/models/multimodal.py 42.6% 81 Multimodal model support
dashboard/routes/router.py 42.9% 12 Route registration
dashboard/routes/swarm.py 43.3% 17 Swarm routes
timmy/cascade_adapter.py 43.2% 25 Cascade LLM adapter
timmy/tools_intro/__init__.py 44.7% 84 Tool introduction system
timmy/tools.py 46.4% 147 Agent tool definitions
timmy/cli.py 47.4% 30 CLI entry point
timmy/conversation.py 48.5% 34 Conversation management

Recommendation: brain/client.py + brain/worker.py together miss 283 statements and are the core of the brain/memory system. timmy/tools.py misses 147 statements and is the agent's tool registry — high impact.


Priority 3 — Test Infrastructure Issues

3a. Broken Tests (4 failures)

All in tests/test_setup_script.py — tests reference /home/ubuntu/setup_timmy.sh which doesn't exist. These tests are environment-specific and should either:

  • Be marked @pytest.mark.skip_ci or @pytest.mark.functional
  • Use a fixture to locate the script relative to the project

3b. Collection Errors (5 errors)

tests/functional/test_setup_prod.py — same issue, references a non-existent script path. Should be guarded with a skip condition.

3c. pytest-xdist Conflicts with Coverage

The pyproject.toml addopts includes -n auto --dist worksteal (xdist), but make test-cov also passes --cov flags. This causes a conflict:

pytest: error: unrecognized arguments: -n --dist worksteal

Fix: Either:

  • Remove -n auto --dist worksteal from addopts and add it only in make test target
  • Or use -p no:xdist in the coverage targets (current workaround)

3d. Tox Configuration

tox.ini has unit and integration environments that run the exact same command — they're aliases. This is misleading:

  • unit should run -m unit (fast, no I/O)
  • integration should run -m integration (may use SQLite)
  • Consider adding a coverage tox env

3e. CI Workflow (tests.yml)

  • CI uses pip install -e ".[dev]" but the project uses Poetry — dependency resolution may differ
  • CI doesn't pass marker filters, so it runs all tests including those that may need Docker/Ollama
  • No coverage enforcement in CI (the fail_under=60 in pyproject.toml only works with --cov-fail-under)
  • No caching of Poetry virtualenvs

Priority 4 — Test Quality Gaps

4a. Missing Error-Path Testing

Many modules have happy-path tests but lack coverage for:

  • Graceful degradation paths: The architecture mandates graceful degradation when Ollama/Redis/AirLLM are unavailable, but most fallback paths are untested (e.g., cascade.py lines 563655)
  • brain/client.py: Only 14.8% covered — connection failures, retries, and error handling are untested
  • infrastructure/error_capture.py: 0% — the error capture system itself has no tests

4b. No Integration Tests for Events System

The infrastructure/events/ package (broadcaster.py + bus.py) is 0% covered. This is the pub/sub backbone for the application. Tests should cover:

  • Event subscription and dispatch
  • Multiple subscribers
  • Error handling in event handlers
  • Async event broadcasting

4c. Security Tests Are Thin

  • tests/security/ has only 3 files totaling ~140 lines
  • src/timmy_serve/l402_proxy.py (payment gating, listed as security-sensitive) has no dedicated test file
  • CSRF tests exist but bypass/traversal tests are minimal
  • No tests for the approvals.py authorization workflow (39.1% covered)

4d. Missing WebSocket Tests

WebSocket handler (ws_manager/handler.py) has 81.2% coverage, but the disconnect/reconnect and error paths (lines 132147) aren't tested. For a real-time dashboard, WebSocket reliability is critical.

4e. No Tests for timmy/agents/ Subpackage

The Agno-based agent classes (base.py, timmy.py) are at 0% coverage (222 statements). These are stubbed in conftest but never actually exercised. Even with the Agno stub, the control flow and prompt construction logic should be tested.


Priority 5 — Test Speed & Parallelism

Metric Value
Total wall time ~35s (sequential)
Parallel (-n auto) Would be ~10-15s
Slowest category Functional tests (HTTP, Docker)

Observations:

  • 30-second timeout per test is generous — consider 10s for unit, 30s for integration
  • The --dist worksteal strategy is good for uneven test durations
  • 39 tests are skipped (mostly due to missing markers/env) — this is expected
  • No test duration profiling is configured (consider --durations=10)

Quick Wins (High ROI, Low Effort)

  1. Fix the 4 broken tests in test_setup_script.py (add skip guards)
  2. Fix xdist/coverage conflict in pyproject.toml addopts
  3. Differentiate tox unit vs integration environments
  4. Add --durations=10 to default addopts for profiling slow tests
  5. Add --cov-fail-under=60 to CI workflow to enforce the threshold

Medium Effort, High Impact

  1. Test the events system (broadcaster.py + bus.py) — 141 uncovered statements, critical infrastructure
  2. Test timmy/agents/timmy.py — 165 uncovered statements, core agent
  3. Test brain/client.py and brain/worker.py — 283 uncovered statements, core memory
  4. Test timmy/tools.py error paths — 147 uncovered statements
  5. Test error_capture.py — 91 uncovered statements, observability blind spot

Longer Term

  1. Add graceful-degradation tests — verify fallback behavior for all optional services
  2. Expand security test suite — approvals, L402 proxy, input sanitization
  3. Add coverage tox environment and enforce in CI
  4. Align CI with Poetry — use poetry install instead of pip for consistent resolution
  5. Target 75% coverage as the next threshold milestone (currently 63.6%)

Coverage Floor Modules (Already Well-Tested)

These modules are at 95%+ and serve as good examples of testing patterns:

  • spark/eidos.py — 98.3%
  • spark/memory.py — 98.3%
  • infrastructure/models/registry.py — 97.1%
  • timmy/agent_core/ollama_adapter.py — 97.8%
  • timmy/agent_core/interface.py — 100%
  • dashboard/middleware/security_headers.py — 100%
  • dashboard/routes/agents.py — 100%
  • timmy_serve/inter_agent.py — 100%