Complete the module consolidation planned in REFACTORING_PLAN.md: Modules merged: - work_orders/ + task_queue/ → swarm/ (subpackages) - self_modify/ + self_tdd/ + upgrades/ → self_coding/ (subpackages) - tools/ → creative/tools/ - chat_bridge/ + telegram_bot/ + shortcuts/ + voice/ → integrations/ (new) - ws_manager/ + notifications/ + events/ + router/ → infrastructure/ (new) - agents/ + agent_core/ + memory/ → timmy/ (subpackages) Updated across codebase: - 66 source files: import statements rewritten - 13 test files: import + patch() target strings rewritten - pyproject.toml: wheel includes (28→14), entry points updated - CLAUDE.md: singleton paths, module map, entry points table - AGENTS.md: file convention updates - REFACTORING_PLAN.md: execution status, success metrics Extras: - Module-level CLAUDE.md added to 6 key packages (Phase 6.2) - Zero test regressions: 1462 tests passing https://claude.ai/code/session_01JNjWfHqusjT3aiN4vvYgUk
18 KiB
Timmy Time — Architectural Refactoring Plan
Author: Claude (VP Engineering review)
Date: 2026-02-26
Branch: claude/plan-repo-refactoring-hgskF
Executive Summary
The Timmy Time codebase has grown to 53K lines of Python across 272
files (169 source + 103 test), 28 modules in src/, 27 route files,
49 templates, 90 test files, and 87KB of root-level markdown. It
works, but it's burning tokens, slowing down test runs, and making it hard to
reason about change impact.
This plan proposes 6 phases of refactoring, ordered by impact and risk. Each phase is independently valuable — you can stop after any phase and still be better off.
The Problems
1. Monolith sprawl
28 modules in src/ with no grouping. Eleven modules aren't even included in
the wheel build (agents, events, hands, mcp, memory, router,
self_coding, task_queue, tools, upgrades, work_orders). Some are
used by the dashboard routes but forgotten in pyproject.toml.
2. Dashboard is the gravity well
The dashboard has 27 route files (4,562 lines), 49 templates, and has become the integration point for everything. Every new feature = new route file + new template + new test file. This doesn't scale.
3. Documentation entropy
10 root-level .md files (87KB). README is 303 lines, CLAUDE.md is 267 lines,
AGENTS.md is 342 lines — with massive content duplication between them. Plus
PLAN.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md, MEMORY.md,
IMPLEMENTATION_SUMMARY.md, QUALITY_ANALYSIS.md, QUALITY_REVIEW_REPORT.md.
Human eyes glaze over. AI assistants waste tokens reading redundant info.
4. Test sprawl — and a skeleton problem
97 test files, 19,600 lines — but 61 of those files (63%) are empty
skeletons with zero actual test functions. Only 36 files have real tests
containing 471 test functions total. Many "large" test files (like
test_scripture.py at 901 lines, test_router_cascade.py at 523 lines) are
infrastructure-only — class definitions, imports, fixtures, but no assertions.
The functional/E2E directory (tests/functional/) has 7 files and 0 working
tests. Tests are flat in tests/ with no organization. Running the full suite
means loading every module, every mock, every fixture even when you only
changed one thing.
5. Unclear project boundaries
Is this one project or several? The timmy CLI, timmy-serve API server,
self-tdd watchdog, and self-modify CLI are four separate entry points that
could be four separate packages. The creative extra needs PyTorch. The
lightning module is a standalone payment system. These shouldn't live in the
same test run.
6. Wheel build doesn't match reality
pyproject.toml includes 17 modules but src/ has 28. The missing 11 modules
are used by code that IS included (dashboard routes import from hands,
mcp, memory, work_orders, etc.). The wheel would break at runtime.
7. Dependency coupling through dashboard
The dashboard is the hub that imports from 20+ modules. The dependency graph
flows inward: config is the foundation (22 modules depend on it), mcp is
widely used (12+ importers), swarm is referenced by 15+ modules. No true
circular dependencies exist (the timmy ↔ swarm relationship uses lazy
imports), but the dashboard pulls in everything, so changing any module can
break the dashboard routes.
8. Conftest does too much
tests/conftest.py has 4 autouse fixtures that run on every single test:
reset message log, reset coordinator state, clean database, cleanup event
loops. Many tests don't need any of these. This adds overhead to the test
suite and couples all tests to the swarm coordinator.
Phase 1: Documentation Cleanup (Low Risk, High Impact)
Goal: Cut root markdown from 87KB to ~20KB. Make README human-readable. Eliminate token waste.
1.1 Slim the README
Cut README.md from 303 lines to ~80 lines:
# Timmy Time — Mission Control
Local-first sovereign AI agent system. Browser dashboard, Ollama inference,
Bitcoin Lightning economics. No cloud AI.
## Quick Start
make install && make dev → http://localhost:8000
## What's Here
- Timmy Agent (Ollama/AirLLM)
- Mission Control Dashboard (FastAPI + HTMX)
- Swarm Coordinator (multi-agent auctions)
- Lightning Payments (L402 gating)
- Creative Studio (image/music/video)
- Self-Coding (codebase-aware self-modification)
## Commands
make dev / make test / make docker-up / make help
## Documentation
- Development guide: CLAUDE.md
- Architecture: docs/architecture-v2.md
- Agent conventions: AGENTS.md
- Config reference: .env.example
1.2 De-duplicate CLAUDE.md
Remove content that duplicates README or AGENTS.md. CLAUDE.md should only contain what AI assistants need that isn't elsewhere:
- Architecture patterns (singletons, config, HTMX, graceful degradation)
- Testing conventions (conftest, fixtures, stubs)
- Security-sensitive areas
- Entry points table
Target: 267 → ~130 lines.
1.3 Archive or delete temporary docs
| File | Action |
|---|---|
MEMORY.md |
DELETE — session context, not permanent docs |
WORKSET_PLAN.md |
DELETE — use GitHub Issues |
WORKSET_PLAN_PHASE2.md |
DELETE — use GitHub Issues |
PLAN.md |
MOVE to docs/PLAN_ARCHIVE.md |
IMPLEMENTATION_SUMMARY.md |
MOVE to docs/IMPLEMENTATION_ARCHIVE.md |
QUALITY_ANALYSIS.md |
CONSOLIDATE with docs/QUALITY_AUDIT.md |
QUALITY_REVIEW_REPORT.md |
CONSOLIDATE with docs/QUALITY_AUDIT.md |
Result: Root directory goes from 10 .md files to 3 (README, CLAUDE,
AGENTS).
1.4 Clean up .handoff/
The .handoff/ directory (CHECKPOINT.md, CONTINUE.md, TODO.md, scripts) is
session-scoped context. Either gitignore it or move to docs/handoff/.
Phase 2: Module Consolidation (Medium Risk, High Impact)
Goal: Reduce 28 modules to ~12 by merging small, related modules into coherent packages. This directly reduces cognitive load and token consumption.
2.1 Module structure (implemented)
src/ # 14 packages (was 28)
config.py # Pydantic settings (foundation)
timmy/ # Core agent + agents/ + agent_core/ + memory/
dashboard/ # FastAPI web UI (22 route files)
swarm/ # Coordinator + task_queue/ + work_orders/
self_coding/ # Git safety + self_modify/ + self_tdd/ + upgrades/
creative/ # Media generation + tools/
infrastructure/ # ws_manager/ + notifications/ + events/ + router/
integrations/ # chat_bridge/ + telegram_bot/ + shortcuts/ + voice/
lightning/ # L402 payment gating (standalone, security-sensitive)
mcp/ # MCP tool registry and discovery
spark/ # Event capture and advisory
hands/ # 6 autonomous Hand agents
scripture/ # Biblical text integration
timmy_serve/ # L402-gated API server
2.2 Dashboard route consolidation
27 route files → ~12 by grouping related routes:
| Current files | Merged into |
|---|---|
agents.py, briefing.py |
agents.py |
swarm.py, swarm_internal.py, swarm_ws.py |
swarm.py |
voice.py, voice_enhanced.py |
voice.py |
mobile.py, mobile_test.py |
mobile.py (delete test page) |
self_coding.py, self_modify.py |
self_coding.py |
tasks.py, work_orders.py |
tasks.py |
mobile_test.py (257 lines) is a test page route that's excluded from
coverage — it should not ship in production.
2.3 Fix the wheel build
Update pyproject.toml [tool.hatch.build.targets.wheel] to include all
modules that are actually imported. Currently 11 modules are missing from the
build manifest.
Phase 3: Test Reorganization (Medium Risk, Medium Impact)
Goal: Organize tests to match module structure, enable selective test runs, reduce full-suite runtime.
3.1 Mirror source structure in tests
tests/
conftest.py # Global fixtures only
timmy/ # Tests for timmy/ module
conftest.py # Timmy-specific fixtures
test_agent.py
test_backends.py
test_cli.py
test_orchestrator.py
test_personas.py
test_memory.py
dashboard/
conftest.py # Dashboard fixtures (client fixture)
test_routes_agents.py
test_routes_swarm.py
...
swarm/
test_coordinator.py
test_tasks.py
test_work_orders.py
integrations/
test_chat_bridge.py
test_telegram.py
test_voice.py
self_coding/
test_git_safety.py
test_codebase_indexer.py
test_self_modify.py
...
3.2 Add pytest marks for selective execution
# pyproject.toml
[tool.pytest.ini_options]
markers = [
"unit: Unit tests (fast, no I/O)",
"integration: Integration tests (may use SQLite)",
"dashboard: Dashboard route tests",
"swarm: Swarm coordinator tests",
"slow: Tests that take >1 second",
]
Usage:
make test # Run all tests
pytest -m unit # Fast unit tests only
pytest -m dashboard # Just dashboard tests
pytest tests/swarm/ # Just swarm module tests
pytest -m "not slow" # Skip slow tests
3.3 Audit and clean skeleton test files
61 test files are empty skeletons — they have imports, class definitions, and fixture setup but zero test functions. These add import overhead and create a false sense of coverage. For each skeleton file:
- If the module it tests is stable and well-covered elsewhere → delete it
- If the module genuinely needs tests → implement the tests or file an issue
- If it's a duplicate (e.g., both
test_swarm.pyandtest_swarm_integration.pyexist) → consolidate
Notable skeletons to address:
test_scripture.py(901 lines, 0 tests) — massive infrastructure, no assertionstest_router_cascade.py(523 lines, 0 tests) — same patterntest_agent_core.py(457 lines, 0 tests)test_self_modify.py(451 lines, 0 tests)- All 7 files in
tests/functional/(0 working tests)
3.4 Split genuinely oversized test files
For files that DO have tests but are too large:
test_task_queue.py(560 lines, 30 tests) → split by feature areatest_mobile_scenarios.py(339 lines, 36 tests) → split by scenario group
Rule of thumb: No test file over 400 lines.
Phase 4: Configuration & Build Cleanup (Low Risk, Medium Impact)
4.1 Clean up pyproject.toml
- Fix the wheel include list to match actual imports
- Consider whether 4 separate CLI entry points belong in one package
- Add
[project.urls]for documentation, repository links - Review dependency pins — some are very loose (
>=1.0.0)
4.2 Consolidate Docker files
4 docker-compose variants (default, dev, prod, test) is a lot. Consider:
docker-compose.yml(base)docker-compose.override.yml(dev — auto-loaded by Docker)docker-compose.prod.yml(production only)
4.3 Clean up root directory
Non-essential root files to move or delete:
| File | Action |
|---|---|
apply_security_fixes.py |
Move to scripts/ or delete if one-time |
activate_self_tdd.sh |
Move to scripts/ |
coverage.xml |
Gitignore (CI artifact) |
data/self_modify_reports/ |
Gitignore the contents |
Phase 5: Consider Package Extraction (High Risk, High Impact)
Goal: Evaluate whether some modules should be separate packages/repos.
5.1 Candidates for extraction
| Module | Why extract | Dependency direction |
|---|---|---|
lightning/ |
Standalone payment system, security-sensitive | Dashboard imports lightning |
creative/ |
Needs PyTorch, very different dependency profile | Dashboard imports creative |
timmy-serve |
Separate process (port 8402), separate purpose | Shares config + timmy agent |
self_coding/ + self_modify/ |
Self-contained self-modification system | Dashboard imports for routes |
5.2 Monorepo approach (recommended over multi-repo)
If splitting, use a monorepo with namespace packages:
packages/
timmy-core/ # Agent + memory + CLI
timmy-dashboard/ # FastAPI app
timmy-swarm/ # Coordinator + tasks
timmy-lightning/ # Payment system
timmy-creative/ # Creative tools (heavy deps)
Each package gets its own pyproject.toml, test suite, and can be installed
independently. But they share the same repo, CI, and release cycle.
However: This is high effort and may not be worth it unless the team grows or the dependency profiles diverge further. Consider this only after Phases 1-4 are done and the pain persists.
Phase 6: Token Optimization for AI Development (Low Risk, High Impact)
Goal: Reduce context window consumption when AI assistants work on this codebase.
6.1 Lean CLAUDE.md (already covered in Phase 1)
Every byte in CLAUDE.md is read by every AI interaction. Remove duplication.
6.2 Module-level CLAUDE.md files
Instead of one massive guide, put module-specific context where it's needed:
src/swarm/CLAUDE.md # "This module is security-sensitive. Always..."
src/lightning/CLAUDE.md # "Never hard-code secrets. Use settings..."
src/dashboard/CLAUDE.md # "Routes return template partials for HTMX..."
AI assistants read these only when working in that directory.
6.3 Standardize module docstrings
Every __init__.py should have a one-line summary. AI assistants read these
to understand module purpose without reading every file:
"""Swarm — Multi-agent coordinator with auction-based task assignment."""
6.4 Reduce template duplication
49 templates with repeated boilerplate. Consider Jinja2 macros for common patterns (card layouts, form groups, table rows).
Prioritized Execution Order
| Priority | Phase | Effort | Risk | Impact |
|---|---|---|---|---|
| 1 | Phase 1: Doc cleanup | 2-3 hours | Low | High — immediate token savings |
| 2 | Phase 6: Token optimization | 1-2 hours | Low | High — ongoing AI efficiency |
| 3 | Phase 4: Config/build cleanup | 1-2 hours | Low | Medium — hygiene |
| 4 | Phase 2: Module consolidation | 4-8 hours | Medium | High — structural improvement |
| 5 | Phase 3: Test reorganization | 3-5 hours | Medium | Medium — faster test cycles |
| 6 | Phase 5: Package extraction | 8-16 hours | High | High — only if needed |
Quick Wins (Can Do Right Now)
- Delete MEMORY.md, WORKSET_PLAN.md, WORKSET_PLAN_PHASE2.md (3 files, 0 risk)
- Move PLAN.md, IMPLEMENTATION_SUMMARY.md, quality docs to
docs/(5 files) - Slim README to ~80 lines
- Fix pyproject.toml wheel includes (11 missing modules)
- Gitignore
coverage.xmlanddata/self_modify_reports/ - Delete
dashboard/routes/mobile_test.py(test page in production routes) - Delete or gut empty test skeletons (61 files with 0 tests — they waste CI time and create noise)
What NOT to Do
- Don't rewrite from scratch. The code works. Refactor incrementally.
- Don't split into multiple repos. Monorepo with packages (if needed) is simpler for a small team.
- Don't change the tech stack. FastAPI + HTMX + Jinja2 is fine. Don't add React, Vue, or a SPA framework.
- Don't merge CLAUDE.md into README. They serve different audiences.
- Don't remove test files just to reduce count. Reorganize them.
- Don't break the singleton pattern. It works for this scale.
Success Metrics
| Metric | Original | Target | Current |
|---|---|---|---|
Root .md files |
10 | 3 | 5 |
| Root markdown size | 87KB | ~20KB | ~28KB |
src/ modules |
28 | ~12-15 | 14 |
| Dashboard routes | 27 | ~12-15 | 22 |
| Test organization | flat | mirrored | mirrored |
| Tests passing | 471 | 500+ | 1462 |
| Wheel modules | 17/28 | all | all |
| Module-level docs | 0 | all key modules | 6 |
| AI context reduction | — | ~40% | ~50% (fewer modules to scan) |
Execution Status
Completed
-
Phase 1: Doc cleanup — README 303→93 lines, CLAUDE.md 267→80, AGENTS.md 342→72, deleted 3 session docs, archived 4 planning docs
-
Phase 4: Config/build cleanup — fixed 11 missing wheel modules, added pytest markers, updated .gitignore, moved scripts to scripts/
-
Phase 6: Token optimization — added docstrings to 15+ init.py files
-
Phase 3: Test reorganization — 97 test files organized into 13 subdirectories mirroring source structure
-
Phase 2a: Route consolidation — 27 → 22 route files (merged voice, swarm internal/ws, self-modify; deleted mobile_test)
-
Phase 2b: Full module consolidation — 28 → 14 modules. All merges completed in a single pass with automated import rewriting (66 source files + 13 test files updated). Modules consolidated:
work_orders/+task_queue/→swarm/self_modify/+self_tdd/+upgrades/→self_coding/tools/→creative/tools/chat_bridge/+telegram_bot/+shortcuts/+voice/→integrations/(new)ws_manager/+notifications/+events/+router/→infrastructure/(new)agents/+agent_core/+memory/→timmy/- pyproject.toml entry points and wheel includes updated
- Module-level CLAUDE.md files added (Phase 6.2)
- Zero test regressions: 1462 tests passing
-
Phase 6.2: Module-level CLAUDE.md — added to swarm/, self_coding/, infrastructure/, integrations/, creative/, lightning/
Remaining
- Phase 5: Package extraction — only if team grows or dep profiles diverge