How Kumiho's OpenClaw Plugin Handles Memory, Recall, and Creative Outputs

The introductory post covered what the plugin does and why. This one covers how it works — feature by feature.
Auto-Recall and Auto-Capture
These are the two hooks that run silently on every turn.
Auto-Recall fires before your agent responds. The plugin queries the memory graph for anything relevant to the current message and injects those memories into context — past decisions, stated preferences, open tasks, prior work. The agent never sees the raw database call; it just has more context.
Auto-Capture fires after the agent responds. The plugin extracts the high-signal parts of the turn — facts, decisions, named entities, action items — and queues them for consolidation into the graph. Raw conversation text is never stored. Only structured summaries reach the database.
Both hooks are enabled by default. You can turn either off independently:
{
"autoRecall": false,
"autoCapture": true
}
A common pattern: disable auto-recall for batch processing tasks where you want a clean context, but keep auto-capture so the results get stored.
How Recall Queries Are Built
The plugin doesn't just use the raw user message as the search query. Short conversational messages like "yeah", "what about that?", or "tell me more" carry almost no semantic signal on their own — a naive search on those would return generic results unrelated to what's actually being discussed.
Instead, the plugin builds a context-enriched query:
- If the current message is short (≤ 6 words), the previous user message is appended for topic context
- The first 20 words of the last assistant response are also appended
- Tokens are deduplicated and the query is capped at 200 characters
The result: even a vague follow-up surfaces memories relevant to the ongoing topic, not just the literal words in the message.
Recall Scope
In personal DM contexts, recall searches across the entire memory project — personal memories, work memories, any space you've stored context in. The agent's knowledge of your work is just as relevant as its knowledge of your preferences.
In group and team channel contexts, recall is scoped to that channel's space. This prevents one team's context from leaking into another's.
Zero-Latency Prefetch
Memory recall adds zero latency after the first turn — because the next turn's memories are fetched in the background while you're reading the current response.
Turn 1 (cold start): Recall fires in parallel with response generation. There's a 1.5-second window — if recall finishes in time, memories are injected into context. If not, the agent responds without them. Either way, nothing blocks.
After each turn (prefetch): While you're reading the response, the plugin uses the current conversation thread to predict what you'll ask next and prefetches those memories. The prefetch result is stored in session state. When your next message arrives, the recalled context is already waiting — no round-trip needed.
Turn 2+: Zero milliseconds added to response time.
The prefetch query uses the same context-enrichment logic as auto-recall — the assistant's last response is the strongest signal for what topic comes next.
Two-Track Consolidation
Short-term memories live in a Redis working buffer. Consolidation to the long-term Neo4j graph happens on two independent tracks:
Threshold track — fires when the message count hits consolidationThreshold (default: 20 messages / 10 turns). Designed for long, continuous conversations.
Idle track — fires when the session goes quiet for idleConsolidationTimeout seconds (default: 300s). Designed for short conversations that end naturally without hitting the threshold.
Without the idle track, a 5-message session where the user closes the tab would never consolidate. Both tracks are needed.
The before_compaction Signal
Before either track flushes, the plugin emits a before_compaction signal. This gives the agent a chance to act before the working buffer is cleared.
A practical example: you've been working with an agent on a code refactor across 20 messages. The threshold fires. Before the flush, before_compaction fires — the agent can call creative_capture to save any intermediate artifacts (diagrams, draft specs, partial implementations) that haven't been explicitly captured yet. Once the buffer clears, those outputs are preserved in the graph with full provenance, not lost in a summarized transcript.
{
"consolidationThreshold": 20,
"idleConsolidationTimeout": 300
}
Force consolidation immediately:
memory_consolidate
Dream State
Dream State is a scheduled maintenance cycle that runs while you're not using the agent — typically overnight.
What it does in a single run:
- Reviews recent memory events (revisions written to the graph since the last run)
- Assesses each revision for staleness, conflicts, and relevance decay
- Adds or updates semantic tags
- Creates relationship edges between related memories
- Deprecates facts that have been superseded
The result: the graph gets structurally richer over time without any manual curation.
Dream State uses OpenClaw's own LLM auth — no separate API key configuration needed in most setups. The plugin reads directly from OpenClaw's auth profiles at runtime.
Configure the schedule in ~/.kumiho/preferences.json:
{
"dreamState": {
"schedule": "0 3 * * *",
"model": {
"provider": "anthropic",
"model": "claude-haiku-4-5-20251001"
}
}
}
Or in openclaw.json plugin config:
{
"dreamStateSchedule": "0 3 * * *",
"dreamStateModel": {
"provider": "anthropic",
"model": "claude-haiku-4-5-20251001"
}
}
Trigger manually:
memory_dream
A typical run: 23 events processed in under 2 minutes using claude-haiku. The model cost per run is negligible. The structural improvement compounds over weeks.
Creative Memory
Most AI memory systems only think about conversational context — what you said, what preferences you expressed. Creative memory solves a different problem: agents don't remember what they've built.
The scenario: You're building a CLI tool across three sessions spread over two weeks. Session 1: you and the agent sketch an architecture and write the core module. Session 2: you refine the API design. Session 3: you come back to add a feature — and the agent has no memory of the prior design decisions. It asks what language you're using. It suggests patterns you already rejected. You re-explain everything.
Creative memory prevents this. Every output the agent produces — documents, code, designs, plans — gets a versioned, addressable identity in the graph. When you resume work, creative_recall brings the full history back.
Capturing an output
creative_capture
title: "CLI tool architecture v2"
content: <the document>
creativeProject: "my-project"
project: "cli-tool"
kind: "plan"
sourceMemoryKref: kref://CognitiveMemory/work/.../prior-decision.conversation?r=3
This creates:
- An Item node:
kref://my-project/cli-tool/cli-tool-architecture-v2.plan - A Revision node:
kref://...?r=1 - A DERIVED_FROM edge linking the revision to the prior decision memory (if
sourceMemoryKrefis provided)
Every subsequent revision gets its own ?r=N kref. The full history is preserved and navigable.
creative_capture is async — returns a job ID immediately:
Job ID: 8849c85a-bfa3-4128-9808-f3700f706fb3
Title : "CLI tool architecture v2"
Space : my-project/cli-tool
Checking job status
creative_job_status
jobId: 8849c85a-bfa3-4128-9808-f3700f706fb3
Returns krefs when done:
Status : done
Item Kref : kref://my-project/cli-tool/cli-tool-architecture-v2.plan
Revision Kref : kref://my-project/cli-tool/cli-tool-architecture-v2.plan?r=1
Memory Kref : kref://CognitiveMemory/personal/mem-abc.conversation?r=1
Recalling previous outputs
creative_recall
space: "cli-tool"
creativeProject: "my-project"
query: "architecture decisions"
Returns items ranked by relevance. Without a query, lists all items in the space — useful at the start of a session to orient the agent before diving in.
Keep creative outputs in a dedicated project (e.g. my-project), not under CognitiveMemory. They live in different graph namespaces for a reason.
The 9 Agent Tools
| Tool | When to use it |
|---|---|
memory_search | Proactively mid-conversation when the user asks about past decisions, prior work, or anything that might have been discussed before. Don't wait for auto-recall — call this explicitly when the topic shifts. |
memory_store | When the user states a preference, makes a decision, or gives a correction. Also when you produce something worth keeping that auto-capture might miss. |
memory_get | When you have a specific kref and need the full memory entry, not a search result. |
memory_list | To browse recent memories at session start, or audit what's been stored in a space. |
memory_forget | To delete stale facts, retracted decisions, or duplicates. Supports kref targeting or search-based deletion. |
memory_consolidate | When you want to force the current session into the graph immediately — before closing a long work session, before switching context. |
memory_dream | To trigger the overnight maintenance cycle on demand — useful after a heavy session to immediately enrich tags and edges. |
creative_capture | For any output worth keeping across sessions: documents, code, designs, specs, plans. Async — returns a job ID. |
creative_recall | To bring previous outputs back at the start of a session, or to search for a specific output across a project space. |
Privacy Architecture
YOUR DEVICE KUMIHO CLOUD
======================== ========================
Raw conversations ─── ✗ ───► Never uploaded
Voice, images, files ─── ✗ ───► Never uploaded
PII (emails, phones) ─── ✗ ───► Stripped before summary
Structured summaries ─────────► Stored in graph DB
Extracted facts ─────────► Stored in graph DB
Semantic tags + edges ─────────► Stored in graph DB
PII redaction runs locally before summarization. By default, emails, phone numbers, SSNs, and credit card numbers are stripped. Disable for development:
{
"piiRedaction": false
}
The privacy.uploadSummariesOnly flag (default: true) enforces that only structured summaries — never raw transcript text — are sent to the graph database. Local artifact files are kept in ~/.kumiho/artifacts/ by default.
Cross-Channel Session Continuity
Session IDs are user-centric, not channel-centric:
alice-personal:user-7f3a9b:20260203:001
The same user talking through Telegram, Slack, or Discord gets a continuous memory thread. The graph doesn't know or care which channel the message came from — it tracks the user.
Conversation number resets daily (:20260203:001 → :20260203:002 → next day :20260204:001). Same-day sessions share context more aggressively than cross-day ones.
Local vs Cloud Mode
| Local | Cloud | |
|---|---|---|
| Setup | kumiho-setup (installs Python backend) | API key only |
| Privacy | Everything runs on your machine | Summaries sent to Kumiho Cloud |
| Infrastructure | Redis + Neo4j on your machine | Managed by Kumiho |
| Best for | Max privacy, development, single device | Multi-device, teams, production |
Local is the default. The plugin spawns a Python MCP subprocess at startup. The subprocess manages the Redis buffer, runs LLM summarization, handles PII redaction, and writes to Neo4j — all on your machine.
Switching to cloud mode:
{
"mode": "cloud",
"apiKey": "your-kumiho-api-key"
}
Configuration Reference
{
"plugins": {
"entries": {
"openclaw-kumiho": {
"enabled": true,
"config": {
"mode": "local",
"project": "CognitiveMemory",
"userId": "your-user-id",
"autoCapture": true,
"autoRecall": true,
"localSummarization": true,
"consolidationThreshold": 20,
"idleConsolidationTimeout": 300,
"sessionTtl": 3600,
"topK": 5,
"searchThreshold": 0.3,
"piiRedaction": true,
"dreamStateSchedule": "0 3 * * *",
"dreamStateModel": {
"provider": "anthropic",
"model": "claude-haiku-4-5-20251001"
},
"local": {
"pythonPath": "python",
"command": "kumiho-mcp",
"timeout": 30000
}
}
}
}
}
}