Add post-generation similarity check to ThinkingEngine.think_once().
Problem: Timmy's thinking engine generates repetitive thoughts because
small local models ignore 'don't repeat' instructions in the prompt.
The same observation ('still no chat messages', 'Alexander's name is in
profile') would appear 14+ times in a single day's journal.
Fix: After generating a thought, compare it against the last 5 thoughts
using SequenceMatcher. If similarity >= 0.6, retry with a new seed up to
2 times. If all retries produce repetitive content, discard rather than
store. Uses stdlib difflib — no new dependencies.
Changes:
- thinking.py: Add _is_too_similar() method with SequenceMatcher
- thinking.py: Wrap generation in retry loop with dedup check
- test_thinking.py: 7 new tests covering exact match, near match,
different thoughts, retry behavior, and max-retry discard
+96/-20 lines in thinking.py, +87 lines in tests.
Replace in-memory MessageLog with SQLite-backed implementation.
Same API surface (append/all/clear/len) so zero caller changes needed.
- data/chat.db stores messages with role, content, timestamp, source
- Lazy DB connection (opened on first use, not at import time)
- Retention policy: oldest messages pruned when count > 500
- New .recent(limit) method for efficient last-N queries
- Thread-safe with explicit locking
- WAL mode for concurrent read performance
- Test isolation: conftest redirects DB to tmp_path per test
- 8 new tests: persistence, retention, concurrency, source field
Closes#46
Allows specifying a named session for conversation persistence.
Use cases:
- Autonomous loops can have their own session (e.g. --session-id loop)
- Multiple users/agents can maintain separate conversations
- Testing different conversation threads without polluting the default
Precedence: --session-id > --new > default 'cli' session
Fixes#52
- Replace eval() in calculator() with _safe_eval() that walks the AST
and only permits: numeric constants, arithmetic ops (+,-,*,/,//,%,**),
unary +/-, math module access, and whitelisted builtins (abs, round,
min, max)
- Reject all other syntax: imports, attribute access on non-math objects,
lambdas, comprehensions, string literals, etc.
- Add 39 tests covering arithmetic, precedence, math functions,
allowed builtins, error handling, and 14 injection prevention cases
Three fixes from real-world testing:
1. Event loop: replaced asyncio.run() with a persistent loop so
Agno's MCP sessions survive across conversation turns. No more
'Event loop is closed' errors on turn 2+.
2. Markdown stripping: voice preamble tells Timmy to respond in
natural spoken language, plus _strip_markdown() as a safety net
removes **bold**, *italic*, bullets, headers, code fences, etc.
TTS no longer reads 'asterisk asterisk'.
3. MCP noise: _suppress_mcp_noise() quiets mcp/agno/httpx loggers
during voice mode so the terminal shows clean transcript only.
32 tests (12 new for markdown stripping + persistent loop).
Replace repeated asyncio.run() calls with a single event loop that
persists across all interview questions. The old approach created and
destroyed loops per question, orphaning MCP stdio transports and
causing 'Event loop is closed' errors on ~50% of questions.
Also adds clean shutdown: closes MCP sessions before closing the loop.
Ref #36
- memory_system.py: fix regex replacement in update_user_profile()
Used lambda instead of raw replacement string to prevent corruption
- memory_system.py: add guards to update_section() for empty/oversized writes
Ref #39
Old hardcoded IDs (seer, forge, echo, helm, quill) replaced with
YAML-defined IDs (orchestrator, researcher, coder, writer, memory,
experimenter). Added test that old names are explicitly rejected.
Agno's MCPTools has an undocumented executable whitelist that blocks
gitea-mcp (Go binary). Switch to server_params=StdioServerParameters()
which bypasses this restriction. Also fixes:
- Use tools.session.call_tool() for standalone invocation (MCPTools
doesn't expose call_tool() directly)
- Use close() instead of disconnect() for cleanup
- Resolve gitea-mcp path via ~/go/bin fallback when not on PATH
- Stub mcp.client.stdio in test conftest
Smoke-tested end-to-end against real Gitea: connect, list_issues,
create issue, close issue, create_gitea_issue_via_mcp — all pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix tool names to match gitea-mcp server: issue_write, issue_read,
list_issues, pull_request_write, etc. (old names didn't exist)
- Fix timeout → timeout_seconds (MCPTools API)
- Move mcp from optional to core dependency (required for agent)
- Add PR tools (pull_request_write/read, list_pull_requests)
- Fix create_gitea_issue_via_mcp to use issue_write with method="create"
- Update tool_safety.py and tests for corrected names
- Regenerate poetry.lock
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The httpx AsyncClient was cached across asyncio.run() boundaries.
Each asyncio.run() creates and closes a new event loop, leaving the
cached client's connections on a dead loop. Second+ calls would fail
with "Event loop is closed".
Fix: create a fresh client per request and close it in a finally block.
No more cross-loop client reuse.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Give Timmy the ability to file Gitea issues when he notices bugs,
stale state, or improvement opportunities in his own codebase.
Components:
- GiteaHand async API client (infrastructure/hands/gitea.py)
- Token auth with ~/.config/gitea/token fallback
- Create/list/close issues, dedup by title similarity
- Graceful degradation when Gitea unreachable
- Tool functions (timmy/tools_gitea.py)
- create_gitea_issue: file issues with dedup + work order bridge
- list_gitea_issues: check existing backlog
- Classified as SAFE (no confirmation needed)
- Thinking post-hook (_maybe_file_issues in thinking.py)
- Every 20 thoughts, LLM classifies recent thoughts for actionable items
- Auto-files bugs/improvements to Gitea with dedup
- Bridges to local work order system for dashboard tracking
- Config: gitea_url, gitea_token, gitea_repo, gitea_enabled,
gitea_timeout, thinking_issue_every
All 1426 tests pass, 74.17% coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidates 3 separate memory databases (semantic_memory.db, swarm.db
memory_entries, brain.db) into a single data/memory.db with facts,
chunks, and episodes tables.
Key changes:
- Add unified schema (timmy/memory/unified.py) with 3 core tables
- Redirect vector_store.py and semantic_memory.py to memory.db
- Add thought distillation: every Nth thought extracts lasting facts
- Enrich agent context with known facts in system prompt
- Add memory_forget tool for removing outdated memories
- Unify embeddings: vector_store delegates to semantic_memory.embed_text
- Bridge spark events to unified event log
- Add pruning for thoughts and events with configurable retention
- Add data migration script (timmy/memory_migrate.py)
- Deprecate brain.memory in favor of unified system
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds /db-explorer page and JSON API to browse all 15 SQLite databases
in data/. Sidebar lists databases with sizes, clicking one renders all
tables as scrollable data tables with row truncation at 200.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite _THINKING_PROMPT with strict rules: 2-3 sentence limit,
anti-confabulation (only reference real data), anti-repetition.
- Add _pick_seed_type() with recent-type dedup (excludes last 3)
- Add _gather_system_snapshot() for real-time grounding (time, thought
count, chat activity, task queue)
- Improve _build_continuity_context() with anti-repetition header and
100-char truncation
- Fix journal + memory timestamps to include local timezone
- 12 new TDD tests covering all improvements
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add optional prompt argument to `timmy tick` so custom journal
prompts can be passed from the CLI (seed_type="prompted").
Fix extract_user_name() learning verbs as names (e.g. "Serving").
Now requires the candidate word to start with a capital letter in
the original message, rejects common verb suffixes (-ing, -tion,
etc.), and deduplicates the naive regex in TimmyWithMemory to use
the fixed ConversationManager.extract_user_name() instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire MEMORY.md + soul.md into the thinking loop so each heartbeat
is grounded in identity and recent context, breaking repetitive loops.
Pre-hook: _load_memory_context() reads hot memory first (changes each
cycle) then soul.md (stable identity), truncated to 1500 chars.
Post-hook: _update_memory() writes a "Last Reflection" section to
MEMORY.md after each thought so the next cycle has fresh context.
soul.md is read-only from the heartbeat — never modified by it.
All hooks degrade gracefully and never crash the heartbeat.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test data was bleeding into production tasks.db because
swarm.task_queue.models.DB_PATH (relative path) was never patched in
conftest.clean_database. Fixed by switching to absolute paths via
settings.repo_root and adding the missing module to the patching list.
Discord bot could leak orphaned clients on retry after ERROR state.
Added _cleanup_stale() to close stale client/task before each start()
attempt, with improved logging in the token watcher.
Rewrote test_paperclip_client.py to use httpx.MockTransport instead of
patching _get/_post/_delete — tests now exercise real HTTP status codes,
error handling, and JSON parsing. Added end-to-end test for
capture_error → create_task DB isolation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>