Files

Alexander Whitestone 3f06e7231d Improve test coverage from 63.6% to 73.4% and fix test infrastructure (#137 )

2026-03-06 13:21:05 -05:00

8.9 KiB

Raw Permalink Blame History

Test Coverage Analysis — Timmy Time Dashboard

Date: 2026-03-06 Overall coverage: 63.6% (7,996 statements, 2,910 missed) Threshold: 60% (passes, but barely) Test suite: 914 passed, 4 failed, 39 skipped, 5 errors — 35 seconds

Current Coverage by Package

Package	Approx. Coverage	Notes
`spark/`	90–98%	Best-covered package
`timmy_serve/`	80–100%	Small package, well tested
`infrastructure/models/`	42–97%	`registry` great, `multimodal` weak
`dashboard/middleware/`	79–100%	Solid
`dashboard/routes/`	36–100%	Highly uneven — some routes untested
`integrations/`	51–100%	Paperclip well covered; Discord weak
`timmy/`	0–100%	Several core modules at 0%
`brain/`	0–75%	`client` and `worker` very low
`infrastructure/events/`	0%	Completely untested
`infrastructure/error_capture.py`	0%	Completely untested

Priority 1 — Zero-Coverage Modules (0%)

These modules have no test coverage at all and represent the biggest risk:

Module	Stmts	Purpose
`src/timmy/semantic_memory.py`	187	Semantic memory system — core agent feature
`src/timmy/agents/timmy.py`	165	Main Timmy agent class
`src/timmy/agents/base.py`	57	Base agent class
`src/timmy/interview.py`	46	Interview flow
`src/infrastructure/error_capture.py`	91	Error capture/reporting
`src/infrastructure/events/broadcaster.py`	67	Event broadcasting
`src/infrastructure/events/bus.py`	74	Event bus
`src/infrastructure/openfang/tools.py`	41	OpenFang tool definitions
`src/brain/schema.py`	14	Brain schema definitions

Recommendation: timmy/agents/timmy.py (165 stmts) and semantic_memory.py (187 stmts) are the highest-value targets. The events subsystem (broadcaster.py + bus.py = 141 stmts) is critical infrastructure with zero tests.

Priority 2 — Under-Tested Modules (<50%)

Module	Cover	Stmts Missed	Purpose
`brain/client.py`	14.8%	127	Brain client — primary brain interface
`brain/worker.py`	16.1%	156	Background brain worker
`brain/embeddings.py`	35.0%	26	Embedding generation
`timmy/approvals.py`	39.1%	42	Approval workflow
`dashboard/routes/marketplace.py`	36.4%	21	Marketplace routes
`dashboard/routes/paperclip.py`	41.1%	96	Paperclip dashboard routes
`infrastructure/hands/tools.py`	41.3%	27	Tool execution
`infrastructure/models/multimodal.py`	42.6%	81	Multimodal model support
`dashboard/routes/router.py`	42.9%	12	Route registration
`dashboard/routes/swarm.py`	43.3%	17	Swarm routes
`timmy/cascade_adapter.py`	43.2%	25	Cascade LLM adapter
`timmy/tools_intro/__init__.py`	44.7%	84	Tool introduction system
`timmy/tools.py`	46.4%	147	Agent tool definitions
`timmy/cli.py`	47.4%	30	CLI entry point
`timmy/conversation.py`	48.5%	34	Conversation management

Recommendation: brain/client.py + brain/worker.py together miss 283 statements and are the core of the brain/memory system. timmy/tools.py misses 147 statements and is the agent's tool registry — high impact.

Priority 3 — Test Infrastructure Issues

3a. Broken Tests (4 failures)

All in tests/test_setup_script.py — tests reference /home/ubuntu/setup_timmy.sh which doesn't exist. These tests are environment-specific and should either:

Be marked @pytest.mark.skip_ci or @pytest.mark.functional
Use a fixture to locate the script relative to the project

3b. Collection Errors (5 errors)

tests/functional/test_setup_prod.py — same issue, references a non-existent script path. Should be guarded with a skip condition.

3c. pytest-xdist Conflicts with Coverage

The pyproject.toml addopts includes -n auto --dist worksteal (xdist), but make test-cov also passes --cov flags. This causes a conflict:

pytest: error: unrecognized arguments: -n --dist worksteal

Fix: Either:

Remove -n auto --dist worksteal from addopts and add it only in make test target
Or use -p no:xdist in the coverage targets (current workaround)

3d. Tox Configuration

tox.ini has unit and integration environments that run the exact same command — they're aliases. This is misleading:

unit should run -m unit (fast, no I/O)
integration should run -m integration (may use SQLite)
Consider adding a coverage tox env

3e. CI Workflow (`tests.yml`)

CI uses pip install -e ".[dev]" but the project uses Poetry — dependency resolution may differ
CI doesn't pass marker filters, so it runs all tests including those that may need Docker/Ollama
No coverage enforcement in CI (the fail_under=60 in pyproject.toml only works with --cov-fail-under)
No caching of Poetry virtualenvs

Priority 4 — Test Quality Gaps

4a. Missing Error-Path Testing

Many modules have happy-path tests but lack coverage for:

Graceful degradation paths: The architecture mandates graceful degradation when Ollama/Redis/AirLLM are unavailable, but most fallback paths are untested (e.g., cascade.py lines 563–655)
brain/client.py: Only 14.8% covered — connection failures, retries, and error handling are untested
infrastructure/error_capture.py: 0% — the error capture system itself has no tests

4b. No Integration Tests for Events System

The infrastructure/events/ package (broadcaster.py + bus.py) is 0% covered. This is the pub/sub backbone for the application. Tests should cover:

Event subscription and dispatch
Multiple subscribers
Error handling in event handlers
Async event broadcasting

4c. Security Tests Are Thin

tests/security/ has only 3 files totaling ~140 lines
src/timmy_serve/l402_proxy.py (payment gating, listed as security-sensitive) has no dedicated test file
CSRF tests exist but bypass/traversal tests are minimal
No tests for the approvals.py authorization workflow (39.1% covered)

4d. Missing WebSocket Tests

WebSocket handler (ws_manager/handler.py) has 81.2% coverage, but the disconnect/reconnect and error paths (lines 132–147) aren't tested. For a real-time dashboard, WebSocket reliability is critical.

4e. No Tests for `timmy/agents/` Subpackage

The Agno-based agent classes (base.py, timmy.py) are at 0% coverage (222 statements). These are stubbed in conftest but never actually exercised. Even with the Agno stub, the control flow and prompt construction logic should be tested.

Priority 5 — Test Speed & Parallelism

Metric	Value
Total wall time	~35s (sequential)
Parallel (`-n auto`)	Would be ~10-15s
Slowest category	Functional tests (HTTP, Docker)

Observations:

30-second timeout per test is generous — consider 10s for unit, 30s for integration
The --dist worksteal strategy is good for uneven test durations
39 tests are skipped (mostly due to missing markers/env) — this is expected
No test duration profiling is configured (consider --durations=10)

Recommended Action Plan

Quick Wins (High ROI, Low Effort)

Fix the 4 broken tests in test_setup_script.py (add skip guards)
Fix xdist/coverage conflict in pyproject.toml addopts
Differentiate tox unit vs integration environments
Add --durations=10 to default addopts for profiling slow tests
Add --cov-fail-under=60 to CI workflow to enforce the threshold

Medium Effort, High Impact

Test the events system (broadcaster.py + bus.py) — 141 uncovered statements, critical infrastructure
Test timmy/agents/timmy.py — 165 uncovered statements, core agent
Test brain/client.py and brain/worker.py — 283 uncovered statements, core memory
Test timmy/tools.py error paths — 147 uncovered statements
Test error_capture.py — 91 uncovered statements, observability blind spot

Longer Term

Add graceful-degradation tests — verify fallback behavior for all optional services
Expand security test suite — approvals, L402 proxy, input sanitization
Add coverage tox environment and enforce in CI
Align CI with Poetry — use poetry install instead of pip for consistent resolution
Target 75% coverage as the next threshold milestone (currently 63.6%)

Coverage Floor Modules (Already Well-Tested)

These modules are at 95%+ and serve as good examples of testing patterns:

spark/eidos.py — 98.3%
spark/memory.py — 98.3%
infrastructure/models/registry.py — 97.1%
timmy/agent_core/ollama_adapter.py — 97.8%
timmy/agent_core/interface.py — 100%
dashboard/middleware/security_headers.py — 100%
dashboard/routes/agents.py — 100%
timmy_serve/inter_agent.py — 100%

8.9 KiB Raw Permalink Blame History Unescape Escape