Our Approach to AI Memory: Building the Identity Layer
A position paper on why memory architecture matters for relational AI and how we approach the problem
Key Findings
- 1.Current LLMs show significant performance gaps on long-term memory tasks (published research shows GPT-4 at F1=32 vs human F1=88 on LOCOMO benchmark)
- 2.The "lost in the middle" phenomenon causes 15-47% performance degradation when relevant information is mid-context
- 3.We believe memory, personality, and emotional state must be architected as unified systems, not bolted-on features
- 4.Our approach separates episodic, semantic, and emotional memory with different retention and retrieval strategies
- 5.We prioritize relationship coherence over raw retrieval accuracy as our north star metric
Table of Contents
Abstract
Current AI systems face a fundamental limitation: they forget. Despite remarkable advances in language understanding and generation, most AI interactions begin from zero—no memory of previous conversations, no accumulated understanding, no relationship continuity. This isn't a minor inconvenience; it's an architectural constraint that prevents AI from forming genuine relationships with users. This position paper outlines Promitheus's approach to solving the memory problem for relational AI. We examine the limitations of current approaches documented in recent research, including the "lost in the middle" phenomenon where LLMs struggle to use information in long contexts, and the significant gap between human and AI performance on long-term conversational memory tasks. We then describe our architectural philosophy for the Promitheus: a system designed around the premise that memory, personality, and emotional continuity are inseparable components of AI identity. This is not an empirical benchmark study. We present our perspective on the problem space, informed by published research, and describe the principles guiding our technical approach. Our goal is to contribute to the broader conversation about what it means to build AI that truly remembers.
1. Introduction: The Memory Problem
Every meaningful human relationship is built on memory. We remember shared experiences, accumulate understanding of each other's preferences and patterns, and develop emotional context that shapes how we interact. Without memory, there is no relationship—only a series of disconnected encounters with strangers.
Current AI systems, despite their impressive capabilities, operate as strangers in every conversation. A user might spend hours sharing their hopes, fears, and daily experiences with an AI assistant, only to return the next day and find the slate wiped clean. This isn't a bug to be fixed with a larger context window; it's a fundamental architectural limitation.
Why This Matters
The implications extend beyond user experience. For AI companions, the inability to remember undermines the entire value proposition. For AI agents that need to act on users' behalf over time, statelessness makes continuity impossible. For any application where trust and understanding matter, starting from zero every session is a non-starter.
At Promitheus, we believe solving the memory problem is prerequisite to building AI that can form genuine relationships. This paper describes our perspective on the problem and our approach to addressing it.
2. What Research Tells Us
Before describing our approach, we want to acknowledge the significant research that informs our thinking. We are building on the shoulders of others, and it's important to represent the current state of knowledge accurately.
The LOCOMO Benchmark
The LOCOMO (Long-term Conversational Memory) benchmark, published by Maharana et al. in 2024, provides one of the most rigorous evaluations of long-term conversational memory in LLMs. The benchmark tests memory over conversations spanning 300 turns and 9,000 tokens across up to 35 sessions.
The results are sobering: GPT-4 achieves an F1 score of 32.1 on question-answering tasks, while human performance reaches 87.9. Even with RAG augmentation, models improve by only 22-66% while still underperforming humans by approximately 56%. This isn't a small gap to be closed with incremental improvements—it's a fundamental capability difference.
The "Lost in the Middle" Phenomenon
Research from Stanford (Liu et al., 2023) documented what they call the "lost in the middle" problem: LLM performance degrades significantly when relevant information appears in the middle of the context rather than at the beginning or end. Performance drops of 15-47% have been observed as context length increases.
This finding has profound implications for memory systems. Simply stuffing more context into a prompt doesn't solve the memory problem—it may actually make retrieval less reliable for information that isn't positioned optimally.
The 2025-2026 Memory Research Explosion
The past year has seen remarkable progress in agent memory research. A December 2025 survey, "Memory in the Age of AI Agents" (Liu et al.), documents how memory has become a core capability for foundation model-based agents, noting that traditional taxonomies like long/short-term memory are now insufficient to capture the diversity of contemporary systems.
Key advances include:
- **Mem0** (April 2025) demonstrated a 26% relative improvement over OpenAI's memory feature on LOCOMO (66.9% vs 52.9%), with 91% lower latency and 90% fewer tokens
- **MemMachine v0.2** (December 2025) achieved industry-leading LOCOMO scores with ~80% reduction in token usage and 75% faster memory operations
- **KVzip** (November 2025) from Seoul National University enables 3-4x memory compression while maintaining accuracy and doubling response speed
- **Memoria** (December 2025) introduced a modular framework combining session-level summarization with weighted knowledge graph-based user modeling
Agentic Memory: The 2026 Frontier
The most recent research (January 2026) introduces "Agentic Memory" (AgeMem), a framework where LLM agents autonomously decide when to store, retrieve, update, summarize, or discard information. Rather than relying on hand-written rules, the agent learns memory management as part of its policy.
This represents a paradigm shift: memory operations become tool-based actions the agent can invoke, enabling unified long-term and short-term memory management. Early results suggest this approach outperforms static memory architectures on complex, multi-session tasks.
2.1 Published Research: Performance Gaps
The following visualization illustrates performance gaps documented in published research. These are not our benchmarks—they represent findings from the papers cited above.
LOCOMO Benchmark: Human vs LLM Performance
F1 scores on long-term conversational memory QA tasks (Source: Maharana et al., 2024)
3. Our Architectural Philosophy
Given the research landscape, we've developed a set of principles that guide our approach to memory architecture. These are not claims about measured performance—they are design principles we believe will lead to better outcomes.
Principle 1: Memory is Identity
We don't treat memory as a feature to be added to a language model. We treat it as foundational to what the AI is. The Promitheus is built around the premise that an AI's identity emerges from the accumulation of its memories, the consistency of its personality, and the continuity of its emotional state.
This means memory isn't an optimization problem—it's an identity problem. The question isn't "how do we retrieve relevant information?" but "how do we maintain coherent identity across time?"
Principle 2: Different Memories, Different Systems
Human memory isn't monolithic. We have episodic memory (what happened), semantic memory (what we know), and emotional memory (how things felt). These systems have different characteristics: episodic memories are specific and time-bound; semantic memories are abstracted and durable; emotional memories color our interpretation of everything else.
We believe AI memory systems should mirror this structure. A memory of a specific conversation ("last Tuesday, they told me about their promotion") should be handled differently than accumulated knowledge ("they work in marketing") or emotional context ("discussions of work are generally positive for them").
Principle 3: Retrieval Must Be Context-Appropriate
The "lost in the middle" research tells us that raw retrieval isn't enough—where and how information is presented matters. But there's a deeper issue: not all retrievals are appropriate in all contexts.
Consider a memory about a user's deceased parent. This information might be highly relevant to a conversation about grief, but retrieving it during a casual chat about weekend plans could be jarring or harmful. Semantic similarity isn't the same as contextual appropriateness.
We believe retrieval systems must account for emotional context, conversational flow, and relationship dynamics—not just topical relevance.
Principle 4: Consolidation Over Accumulation
Human memory consolidates over time. Individual episodes become generalized knowledge. Details fade while patterns persist. This isn't a failure—it's a feature that prevents cognitive overload and surfaces what matters.
We believe AI memory systems need similar consolidation mechanisms. Without them, memory stores become cluttered with noise, retrieval quality degrades, and costs scale linearly with interaction history. The goal isn't to remember everything—it's to remember what matters.
Principle 5: Reflection as a First-Class Operation
Most memory systems are purely reactive—they only process memories when a user query triggers retrieval. We believe AI needs dedicated time and space for autonomous reflection: consolidating recent experiences, updating self-understanding, processing emotional states, and maintaining internal coherence without the pressure of immediate response.
This mirrors how human memory works during sleep and quiet moments. We don't just recall memories on demand—we actively process and integrate them. Dreams, daydreaming, and idle reflection all serve cognitive functions. AI systems that only think when prompted miss this entire dimension of intelligence.
In Promitheus, reflection is a first-class operation. The AI has scheduled opportunities to review recent interactions, identify patterns, update its understanding of the relationship, and evolve its emotional state. This isn't background optimization—it's core to how identity persists and develops over time.
This approach aligns with recent research on agentic memory (AgeMem, 2026), where agents autonomously decide when to store, retrieve, summarize, or discard information. But we go further: reflection isn't just memory management, it's the mechanism through which AI develops genuine continuity of self.
4. Promitheus Architecture Overview
The Promitheus implements our architectural philosophy through several interconnected systems. We describe these at a conceptual level—specific implementation details will evolve as we learn.
Three-Layer Memory Model
We structure memory into three layers, each with distinct characteristics:
*Episodic Layer*: Stores specific events and interactions with rich temporal metadata. These memories answer "what happened?" and are tied to particular moments in time. Episodic memories consolidate into semantic memories over time.
*Semantic Layer*: Stores accumulated knowledge about the user—preferences, relationships, facts about their life. These memories answer "what do we know?" and are more stable than episodic memories.
*Emotional Layer*: Tracks affective patterns and relationship dynamics. This layer answers "how do things feel?" and influences both retrieval decisions and response generation.
Continuous Consolidation
Rather than batch processing, the Promitheus continuously consolidates memories. Recent interactions are processed immediately for important signals. Over time, episodic memories that reinforce existing patterns are absorbed into semantic memory, while those that represent significant events are preserved.
Context-Aware Retrieval
Every retrieval considers multiple factors: semantic relevance to the current query, temporal recency, emotional appropriateness given the conversational context, and relationship coherence. We don't optimize for a single metric—we balance multiple considerations that together determine what should be surfaced.
Relationship Coherence
We maintain an explicit model of the relationship between the AI and each user. This model tracks relationship stage, communication patterns, shared history, and current dynamics. It serves as a lens through which all memory operations are filtered.
4.1 Conceptual Architecture
This interactive diagram illustrates the conceptual flow of memory in the Promitheus. This represents our architectural approach, not empirical measurements.
AI Memory Evolution Timeline
Foundation for modern LLMs
External knowledge integration
Scalable similarity search
Long-term context retention
Affective AI capabilities
Unified episodic/semantic/emotional
True persistent relationships
5. What We Don't Know Yet
Intellectual honesty requires acknowledging uncertainty. There's much we don't yet know about building effective memory systems for relational AI.
Optimal Consolidation Strategies
We believe consolidation is necessary, but the optimal strategies for when and how to consolidate remain open questions. How aggressively should episodic memories be compressed? What signals indicate a memory should be preserved versus absorbed? We have hypotheses, but empirical answers will come from deployment and iteration.
Measuring Relationship Quality
We've argued that relationship coherence matters more than raw retrieval accuracy, but measuring relationship quality is genuinely difficult. User satisfaction surveys capture something, but they're noisy and delayed. We're still developing metrics that capture what we actually care about.
Scaling Characteristics
We have architectural beliefs about how to handle scale, but we haven't yet operated at the scale where these beliefs will be truly tested. Memory systems that work well for thousands of users may behave differently at millions. We're designing for scale, but we'll learn the real lessons in production.
Emotional Appropriateness
We believe emotional context should influence retrieval, but defining and measuring "emotional appropriateness" is surprisingly hard. What's appropriate varies by user, relationship stage, and cultural context. We're building systems that can adapt, but the adaptation rules are still being refined.
Long-Term Relationship Dynamics
The research we've cited mostly examines memory over weeks or months. We're interested in relationships that span years. How memory systems should behave over truly long time horizons—how forgetting should work, how personality should evolve—remains largely unexplored territory.
6. Approaches to AI Memory: A Comparison
This tool compares different conceptual approaches to AI memory. The ratings represent our qualitative assessment of architectural trade-offs, not empirical benchmarks. Use this to understand the design space, not to evaluate specific systems.
Architecture Comparison Tool
Select architectures to compare their performance across benchmark metrics.
Average Score
7. Why We're Sharing This
We're publishing this position paper for several reasons.
To Contribute to the Conversation
The challenge of building AI that remembers is bigger than any one company. We benefit from published research, and we want to contribute our perspective back to the community. Even if our specific approaches don't become standard, we hope the principles and questions we raise are useful.
To Invite Scrutiny
Describing our approach publicly invites criticism and alternative perspectives. We're confident in our direction but not dogmatic about it. If our architectural principles are flawed, we'd rather learn that sooner through engagement with others thinking about these problems.
To Signal What We Value
By emphasizing relationship coherence over raw retrieval metrics, emotional appropriateness over semantic similarity, and identity preservation over feature accumulation, we're signaling what kind of AI we're trying to build. This helps users understand what to expect and helps potential collaborators understand whether our values align.
To Be Honest About Uncertainty
The AI industry often oversells capabilities and undersells limitations. We've tried to be explicit about what we know, what we believe, and what we're still figuring out. Building trust requires honesty, and honesty includes acknowledging uncertainty.
8. Conclusion
Memory is the foundation of relationship. Current AI systems, despite their remarkable capabilities, lack the persistent memory required for genuine relational continuity. Research demonstrates significant gaps between human and AI performance on memory tasks, and architectural limitations like the "lost in the middle" phenomenon suggest that simply scaling context windows won't solve the problem.
At Promitheus, we're building the Promitheus around a core belief: that memory, personality, and emotional continuity are inseparable components of AI identity. Our architectural approach—separating episodic, semantic, and emotional memory; implementing continuous consolidation; prioritizing context-appropriate retrieval; and maintaining explicit relationship models—reflects this belief.
We don't claim to have solved the memory problem. We claim to be working on it seriously, informed by research, guided by principles, and honest about uncertainty. The gap between current AI and genuine relational capability is large. Closing it will require sustained effort from many people and organizations.
We're publishing this perspective because we believe the conversation matters. How we build AI that remembers will shape the relationships millions of people have with AI systems. Getting this right is worth public discussion, diverse perspectives, and collective iteration.
Memory is how AI becomes relational. We're committed to building AI that truly remembers.
Note on Citations
References
Implement These Findings
Promitheus implements the memory architecture described in this research. Start building AI that truly remembers.