← Back to Blog

Give Your AI Agent Memory That Actually Works

3/2/2026Kumiho
#latest#published

Your AI assistant forgets everything the moment the conversation ends. Every session starts from zero. You repeat yourself. It asks the same questions. Context disappears.

We built a fix.

Introducing @kumiho/openclaw-kumiho

The Kumiho plugin for OpenClaw gives your agent persistent, graph-native memory that survives sessions, model swaps, and channel switches — with a privacy-first architecture where raw conversations never leave your machine.

Install it in two commands:

# 1. Install the OpenClaw plugin
openclaw plugins install @kumiho/openclaw-kumiho

# 2. Set up the Python backend + authenticate
npx --package=@kumiho/openclaw-kumiho kumiho-setup

That's it. Your agent now remembers.

How It Works

Every conversation flows through two automatic hooks:

Auto-Recall — Before your agent responds, Kumiho searches its memory graph for anything relevant to what you just said. Facts, preferences, past decisions — all injected into context silently.

Auto-Capture — After the agent responds, Kumiho extracts what matters (facts, decisions, action items) and stores structured summaries in a Neo4j graph database. Not raw chat logs — distilled knowledge.

The result: your agent builds a growing understanding of you over time, across every conversation.

openclaw-kumiho-diagram

Zero-Latency Recall

Memory lookup doesn't slow your agent down. After the first turn, recall adds 0ms to response time.

Here's the trick: while you're reading the agent's response, Kumiho prefetches memories for the next likely topic in the background. When you send your next message, the context is already waiting. No round-trip. No delay.

  1. Turn 1 (cold start): Recall runs in parallel. Up to 1.5s timeout — the agent starts regardless.
  2. Background prefetch: While you read the response, memories for the next turn are fetched.
  3. Turn 2+: Context is already waiting. Zero milliseconds added.

Privacy-First by Design

YOUR DEVICE                         KUMIHO CLOUD
========================            ========================
Raw conversations        ----X----> Never uploaded
Voice recordings         ----X----> Never uploaded
Images / screenshots     ----X----> Never uploaded

Structured summaries     ----------> Stored in graph DB
Extracted facts          ----------> Stored in graph DB

Raw data stays on your machine. Only structured, PII-redacted summaries reach the graph database. Emails, phone numbers, and SSNs are automatically scrubbed before anything leaves your device.

Cross-Channel Continuity

Start a conversation on Telegram in the morning. Continue on Slack at lunch. Switch to WhatsApp on your phone. Kumiho tracks memory by user, not by channel — your agent's understanding follows you everywhere.

Session IDs are user-centric:

alice-personal:user-7f3a9b:20260203:001

Same user, any channel, continuous memory.

Two-Track Consolidation

Short-term memories live in a Redis working buffer. They get consolidated into the long-term Neo4j graph via two tracks:

TrackTriggerDefault
ThresholdmessageCount >= consolidationThreshold20 messages (10 turns)
IdleNo activity for idleConsolidationTimeout seconds300s (5 min)

Short conversations don't get lost. Long conversations don't overflow. Both tracks reset the counter and start a fresh session.

Dream State: Memory Maintenance While You Sleep

Kumiho includes a built-in Dream State scheduler. Every night, it reviews its own memories — deprecating stale facts, discovering connections between memories, and keeping the graph clean. Like a brain consolidating during sleep.

In 0.2.2, Dream State runs natively inside the plugin using OpenClaw's LLM routing. No external API keys needed.

Configure it in your plugin config or ~/.kumiho/preferences.json:

{
  "dreamState": {
    "schedule": "0 3 * * *",
    "model": {
      "provider": "anthropic",
      "model": "claude-haiku-4-5-20251001"
    }
  }
}

Or test it manually:

openclaw kumiho dream

9 Agent Tools

Beyond automatic recall and capture, your agent gets explicit memory operations:

ToolWhat it does
memory_searchQuery memories by natural language
memory_storeExplicitly save a fact or decision
memory_getRetrieve a specific memory by kref
memory_listList recent memories
memory_forgetDelete or deprecate a memory
memory_consolidateForce session consolidation
memory_dreamTrigger Dream State maintenance
creative_captureSave documents, code, or plans with full graph lineage
project_recallBrowse creative outputs by project

Your agent can say "let me remember that" — and actually mean it.

Local or Cloud — Your Choice

ModeHow it worksBest for
Local (default)Python SDK via MCP stdio — everything runs on your machineMax privacy, development
CloudHTTPS API to Kumiho CloudMulti-device, managed infrastructure

Local mode needs no server deployment. The plugin spawns a Python process and communicates via MCP — Redis buffer, LLM summarization, PII redaction, and Neo4j writes all happen locally.

Configuration

Minimal (after running kumiho-setup)

{
  "plugins": {
    "entries": {
      "openclaw-kumiho": {
        "enabled": true,
        "config": {
          "userId": "your-user-id"
        }
      }
    }
  }
}

Mode defaults to "local", the venv is auto-detected, and Dream State loads from ~/.kumiho/preferences.json.

Full Reference

{
  "plugins": {
    "entries": {
      "openclaw-kumiho": {
        "enabled": true,
        "config": {
          "mode": "local",
          "project": "CognitiveMemory",
          "userId": "your-user-id",

          "autoCapture": true,
          "autoRecall": true,
          "localSummarization": true,

          "consolidationThreshold": 20,
          "idleConsolidationTimeout": 300,
          "sessionTtl": 3600,
          "topK": 5,
          "searchThreshold": 0.3,

          "piiRedaction": true,

          "dreamStateSchedule": "0 3 * * *",
          "dreamStateModel": {
            "provider": "anthropic",
            "model": "claude-haiku-4-5-20251001"
          },

          "llm": {
            "provider": "anthropic",
            "model": "claude-haiku-4-5-20251001"
          },

          "local": {
            "pythonPath": "python",
            "command": "kumiho-mcp",
            "timeout": 30000
          }
        }
      }
    }
  }
}

The Numbers

We benchmarked Kumiho against the LoCoMo-Plus evaluation suite — the hardest test of long-term conversational memory for AI agents.

SystemAccuracy
RAG (text-embedding-large)29.8%
Mem041.4%
GPT-4o (full context)41.9%
Gemini 2.5 Pro (1M context)45.7%
Kumiho Cognitive Memory93.3%
  • Retrieval accuracy: 98.5% across 401 questions
  • Total benchmark cost: ~$14
  • Uses GPT-4o-mini for most operations — one of the cheapest models available

Graph-native retrieval isn't an incremental improvement over RAG. It's a different category.

Read the full benchmark breakdown →

CLI Commands

# Search memories
openclaw kumiho search "what languages does the user know"

# Show memory stats
openclaw kumiho stats

# Manually consolidate
openclaw kumiho consolidate

# Trigger Dream State
openclaw kumiho dream

# Capture a creative output
openclaw kumiho capture "Blog Draft" my-blog --kind document

Chat Commands

In any chat channel:

/memory stats
/capture Blog Draft | my-blog | document

Get Started

# 1. Install the plugin
openclaw plugins install @kumiho/openclaw-kumiho

# 2. Set up Python backend + authenticate
npx --package=@kumiho/openclaw-kumiho kumiho-setup

# 3. Restart your gateway
openclaw gateway restart

Your agent wakes up with memory on the next conversation.