PerspectivesFeatured

Our Approach to AI Memory: Building the Identity Layer

A position paper on why memory architecture matters for relational AI and how we approach the problem

Promitheus Team··18 min read

Key Findings

  • 1.Current LLMs show significant performance gaps on long-term memory tasks (published research shows GPT-4 at F1=32 vs human F1=88 on LOCOMO benchmark)
  • 2.The "lost in the middle" phenomenon causes 15-47% performance degradation when relevant information is mid-context
  • 3.We believe memory, personality, and emotional state must be architected as unified systems, not bolted-on features
  • 4.Our approach separates episodic, semantic, and emotional memory with different retention and retrieval strategies
  • 5.We prioritize relationship coherence over raw retrieval accuracy as our north star metric
Table of Contents

Abstract

Current AI systems face a fundamental limitation: they forget. Despite remarkable advances in language understanding and generation, most AI interactions begin from zero—no memory of previous conversations, no accumulated understanding, no relationship continuity. This isn't a minor inconvenience; it's an architectural constraint that prevents AI from forming genuine relationships with users. This position paper outlines Promitheus's approach to solving the memory problem for relational AI. We examine the limitations of current approaches documented in recent research, including the "lost in the middle" phenomenon where LLMs struggle to use information in long contexts, and the significant gap between human and AI performance on long-term conversational memory tasks. We then describe our architectural philosophy for the Promitheus: a system designed around the premise that memory, personality, and emotional continuity are inseparable components of AI identity. This is not an empirical benchmark study. We present our perspective on the problem space, informed by published research, and describe the principles guiding our technical approach. Our goal is to contribute to the broader conversation about what it means to build AI that truly remembers.

1. Introduction: The Memory Problem

Every meaningful human relationship is built on memory. We remember shared experiences, accumulate understanding of each other's preferences and patterns, and develop emotional context that shapes how we interact. Without memory, there is no relationship—only a series of disconnected encounters with strangers.

Current AI systems, despite their impressive capabilities, operate as strangers in every conversation. A user might spend hours sharing their hopes, fears, and daily experiences with an AI assistant, only to return the next day and find the slate wiped clean. This isn't a bug to be fixed with a larger context window; it's a fundamental architectural limitation.

Why This Matters

The implications extend beyond user experience. For AI companions, the inability to remember undermines the entire value proposition. For AI agents that need to act on users' behalf over time, statelessness makes continuity impossible. For any application where trust and understanding matter, starting from zero every session is a non-starter.

At Promitheus, we believe solving the memory problem is prerequisite to building AI that can form genuine relationships. This paper describes our perspective on the problem and our approach to addressing it.

2. What Research Tells Us

Before describing our approach, we want to acknowledge the significant research that informs our thinking. We are building on the shoulders of others, and it's important to represent the current state of knowledge accurately.

The LOCOMO Benchmark

The LOCOMO (Long-term Conversational Memory) benchmark, published by Maharana et al. in 2024, provides one of the most rigorous evaluations of long-term conversational memory in LLMs. The benchmark tests memory over conversations spanning 300 turns and 9,000 tokens across up to 35 sessions.

The results are sobering: GPT-4 achieves an F1 score of 32.1 on question-answering tasks, while human performance reaches 87.9. Even with RAG augmentation, models improve by only 22-66% while still underperforming humans by approximately 56%. This isn't a small gap to be closed with incremental improvements—it's a fundamental capability difference.

The "Lost in the Middle" Phenomenon

Research from Stanford (Liu et al., 2023) documented what they call the "lost in the middle" problem: LLM performance degrades significantly when relevant information appears in the middle of the context rather than at the beginning or end. Performance drops of 15-47% have been observed as context length increases.

This finding has profound implications for memory systems. Simply stuffing more context into a prompt doesn't solve the memory problem—it may actually make retrieval less reliable for information that isn't positioned optimally.

The 2025-2026 Memory Research Explosion

The past year has seen remarkable progress in agent memory research. A December 2025 survey, "Memory in the Age of AI Agents" (Liu et al.), documents how memory has become a core capability for foundation model-based agents, noting that traditional taxonomies like long/short-term memory are now insufficient to capture the diversity of contemporary systems.

Key advances include:

- **Mem0** (April 2025) demonstrated a 26% relative improvement over OpenAI's memory feature on LOCOMO (66.9% vs 52.9%), with 91% lower latency and 90% fewer tokens

  • **MemMachine v0.2** (December 2025) achieved industry-leading LOCOMO scores with ~80% reduction in token usage and 75% faster memory operations
  • **KVzip** (November 2025) from Seoul National University enables 3-4x memory compression while maintaining accuracy and doubling response speed
  • **Memoria** (December 2025) introduced a modular framework combining session-level summarization with weighted knowledge graph-based user modeling

Agentic Memory: The 2026 Frontier

The most recent research (January 2026) introduces "Agentic Memory" (AgeMem), a framework where LLM agents autonomously decide when to store, retrieve, update, summarize, or discard information. Rather than relying on hand-written rules, the agent learns memory management as part of its policy.

This represents a paradigm shift: memory operations become tool-based actions the agent can invoke, enabling unified long-term and short-term memory management. Early results suggest this approach outperforms static memory architectures on complex, multi-session tasks.

2.1 Published Research: Performance Gaps

The following visualization illustrates performance gaps documented in published research. These are not our benchmarks—they represent findings from the papers cited above.

LOCOMO Benchmark: Human vs LLM Performance

F1 scores on long-term conversational memory QA tasks (Source: Maharana et al., 2024)

0255075100Human CeilingGPT-4GPT-3.5Llama-2-70BMistral-7B
F1 Score

3. Our Architectural Philosophy

Given the research landscape, we've developed a set of principles that guide our approach to memory architecture. These are not claims about measured performance—they are design principles we believe will lead to better outcomes.

Principle 1: Memory is Identity

We don't treat memory as a feature to be added to a language model. We treat it as foundational to what the AI is. The Promitheus is built around the premise that an AI's identity emerges from the accumulation of its memories, the consistency of its personality, and the continuity of its emotional state.

This means memory isn't an optimization problem—it's an identity problem. The question isn't "how do we retrieve relevant information?" but "how do we maintain coherent identity across time?"

Principle 2: Different Memories, Different Systems

Human memory isn't monolithic. We have episodic memory (what happened), semantic memory (what we know), and emotional memory (how things felt). These systems have different characteristics: episodic memories are specific and time-bound; semantic memories are abstracted and durable; emotional memories color our interpretation of everything else.

We believe AI memory systems should mirror this structure. A memory of a specific conversation ("last Tuesday, they told me about their promotion") should be handled differently than accumulated knowledge ("they work in marketing") or emotional context ("discussions of work are generally positive for them").

Principle 3: Retrieval Must Be Context-Appropriate

The "lost in the middle" research tells us that raw retrieval isn't enough—where and how information is presented matters. But there's a deeper issue: not all retrievals are appropriate in all contexts.

Consider a memory about a user's deceased parent. This information might be highly relevant to a conversation about grief, but retrieving it during a casual chat about weekend plans could be jarring or harmful. Semantic similarity isn't the same as contextual appropriateness.

We believe retrieval systems must account for emotional context, conversational flow, and relationship dynamics—not just topical relevance.

Principle 4: Consolidation Over Accumulation

Human memory consolidates over time. Individual episodes become generalized knowledge. Details fade while patterns persist. This isn't a failure—it's a feature that prevents cognitive overload and surfaces what matters.

We believe AI memory systems need similar consolidation mechanisms. Without them, memory stores become cluttered with noise, retrieval quality degrades, and costs scale linearly with interaction history. The goal isn't to remember everything—it's to remember what matters.

Principle 5: Reflection as a First-Class Operation

Most memory systems are purely reactive—they only process memories when a user query triggers retrieval. We believe AI needs dedicated time and space for autonomous reflection: consolidating recent experiences, updating self-understanding, processing emotional states, and maintaining internal coherence without the pressure of immediate response.

This mirrors how human memory works during sleep and quiet moments. We don't just recall memories on demand—we actively process and integrate them. Dreams, daydreaming, and idle reflection all serve cognitive functions. AI systems that only think when prompted miss this entire dimension of intelligence.

In Promitheus, reflection is a first-class operation. The AI has scheduled opportunities to review recent interactions, identify patterns, update its understanding of the relationship, and evolve its emotional state. This isn't background optimization—it's core to how identity persists and develops over time.

This approach aligns with recent research on agentic memory (AgeMem, 2026), where agents autonomously decide when to store, retrieve, summarize, or discard information. But we go further: reflection isn't just memory management, it's the mechanism through which AI develops genuine continuity of self.

4. Promitheus Architecture Overview

The Promitheus implements our architectural philosophy through several interconnected systems. We describe these at a conceptual level—specific implementation details will evolve as we learn.

Three-Layer Memory Model

We structure memory into three layers, each with distinct characteristics:

*Episodic Layer*: Stores specific events and interactions with rich temporal metadata. These memories answer "what happened?" and are tied to particular moments in time. Episodic memories consolidate into semantic memories over time.

*Semantic Layer*: Stores accumulated knowledge about the user—preferences, relationships, facts about their life. These memories answer "what do we know?" and are more stable than episodic memories.

*Emotional Layer*: Tracks affective patterns and relationship dynamics. This layer answers "how do things feel?" and influences both retrieval decisions and response generation.

Continuous Consolidation

Rather than batch processing, the Promitheus continuously consolidates memories. Recent interactions are processed immediately for important signals. Over time, episodic memories that reinforce existing patterns are absorbed into semantic memory, while those that represent significant events are preserved.

Context-Aware Retrieval

Every retrieval considers multiple factors: semantic relevance to the current query, temporal recency, emotional appropriateness given the conversational context, and relationship coherence. We don't optimize for a single metric—we balance multiple considerations that together determine what should be surfaced.

Relationship Coherence

We maintain an explicit model of the relationship between the AI and each user. This model tracks relationship stage, communication patterns, shared history, and current dynamics. It serves as a lens through which all memory operations are filtered.

4.1 Conceptual Architecture

This interactive diagram illustrates the conceptual flow of memory in the Promitheus. This represents our architectural approach, not empirical measurements.

AI Memory Evolution Timeline

2017Transformer architecture introduced

Foundation for modern LLMs

2020RAG systems emerge

External knowledge integration

2022Vector databases mature

Scalable similarity search

2023Persistent memory experiments

Long-term context retention

2024Emotional state modeling

Affective AI capabilities

2025Integrated memory architectures

Unified episodic/semantic/emotional

2026Relational AI systems

True persistent relationships

5. What We Don't Know Yet

Intellectual honesty requires acknowledging uncertainty. There's much we don't yet know about building effective memory systems for relational AI.

Optimal Consolidation Strategies

We believe consolidation is necessary, but the optimal strategies for when and how to consolidate remain open questions. How aggressively should episodic memories be compressed? What signals indicate a memory should be preserved versus absorbed? We have hypotheses, but empirical answers will come from deployment and iteration.

Measuring Relationship Quality

We've argued that relationship coherence matters more than raw retrieval accuracy, but measuring relationship quality is genuinely difficult. User satisfaction surveys capture something, but they're noisy and delayed. We're still developing metrics that capture what we actually care about.

Scaling Characteristics

We have architectural beliefs about how to handle scale, but we haven't yet operated at the scale where these beliefs will be truly tested. Memory systems that work well for thousands of users may behave differently at millions. We're designing for scale, but we'll learn the real lessons in production.

Emotional Appropriateness

We believe emotional context should influence retrieval, but defining and measuring "emotional appropriateness" is surprisingly hard. What's appropriate varies by user, relationship stage, and cultural context. We're building systems that can adapt, but the adaptation rules are still being refined.

Long-Term Relationship Dynamics

The research we've cited mostly examines memory over weeks or months. We're interested in relationships that span years. How memory systems should behave over truly long time horizons—how forgetting should work, how personality should evolve—remains largely unexplored territory.

6. Approaches to AI Memory: A Comparison

This tool compares different conceptual approaches to AI memory. The ratings represent our qualitative assessment of architectural trade-offs, not empirical benchmarks. Use this to understand the design space, not to evaluate specific systems.

Architecture Comparison Tool

Select architectures to compare their performance across benchmark metrics.

Long-term Retention
Context Stuffing
20
Basic RAG
60
Retrieval Speed
Context Stuffing
95
Basic RAG
70
Emotional Context
Context Stuffing
10
Basic RAG
20
Scalability
Context Stuffing
30
Basic RAG
70
Implementation Complexity
Context Stuffing
10
Basic RAG
40
Average Score
33
Context Stuffing
52
Basic RAG

7. Why We're Sharing This

We're publishing this position paper for several reasons.

To Contribute to the Conversation

The challenge of building AI that remembers is bigger than any one company. We benefit from published research, and we want to contribute our perspective back to the community. Even if our specific approaches don't become standard, we hope the principles and questions we raise are useful.

To Invite Scrutiny

Describing our approach publicly invites criticism and alternative perspectives. We're confident in our direction but not dogmatic about it. If our architectural principles are flawed, we'd rather learn that sooner through engagement with others thinking about these problems.

To Signal What We Value

By emphasizing relationship coherence over raw retrieval metrics, emotional appropriateness over semantic similarity, and identity preservation over feature accumulation, we're signaling what kind of AI we're trying to build. This helps users understand what to expect and helps potential collaborators understand whether our values align.

To Be Honest About Uncertainty

The AI industry often oversells capabilities and undersells limitations. We've tried to be explicit about what we know, what we believe, and what we're still figuring out. Building trust requires honesty, and honesty includes acknowledging uncertainty.

8. Conclusion

Memory is the foundation of relationship. Current AI systems, despite their remarkable capabilities, lack the persistent memory required for genuine relational continuity. Research demonstrates significant gaps between human and AI performance on memory tasks, and architectural limitations like the "lost in the middle" phenomenon suggest that simply scaling context windows won't solve the problem.

At Promitheus, we're building the Promitheus around a core belief: that memory, personality, and emotional continuity are inseparable components of AI identity. Our architectural approach—separating episodic, semantic, and emotional memory; implementing continuous consolidation; prioritizing context-appropriate retrieval; and maintaining explicit relationship models—reflects this belief.

We don't claim to have solved the memory problem. We claim to be working on it seriously, informed by research, guided by principles, and honest about uncertainty. The gap between current AI and genuine relational capability is large. Closing it will require sustained effort from many people and organizations.

We're publishing this perspective because we believe the conversation matters. How we build AI that remembers will shape the relationships millions of people have with AI systems. Getting this right is worth public discussion, diverse perspectives, and collective iteration.

Memory is how AI becomes relational. We're committed to building AI that truly remembers.

Note on Citations

This position paper draws on published research to establish the problem context. Key sources include: • Maharana et al. (2024) - LOCOMO benchmark for long-term conversational memory • Liu et al. (2023) - "Lost in the Middle" analysis of context position effects • Liu et al. (2025) - "Memory in the Age of AI Agents" comprehensive survey • Mem0 Research (2025) - Production memory system performance • AgeMem (2026) - Agentic memory framework for unified LTM/STM management • Memoria (2025) - Scalable agentic memory with knowledge graphs All performance numbers cited (F1 scores, accuracy improvements, latency reductions) are from these published sources, not internal Promitheus benchmarks. We have not yet published empirical benchmarks for Promitheus. Full citation details are provided in the References section.

References

[1] Maharana, A., Lee, D., Tuber, S., & Bansal, M.. "Evaluating Very Long-Term Conversational Memory of LLM Agents." ACL 2024, 2024.Link
[2] Liu, N., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P.. "Lost in the Middle: How Language Models Use Long Contexts." arXiv preprint arXiv:2307.03172, 2023.Link
[3] Liu, S., et al.. "Memory in the Age of AI Agents: A Survey." arXiv preprint arXiv:2512.13564, 2025.Link
[4] Mem0 Research Team. "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory." arXiv preprint arXiv:2504.19413, 2025.Link
[5] Xu, Z., et al.. "Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents." arXiv preprint arXiv:2601.01885, 2026.Link
[6] Memoria Research Team. "Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI." arXiv preprint arXiv:2512.12686, 2025.Link
[7] A-MEM Research Team. "A-MEM: Agentic Memory for LLM Agents." arXiv preprint arXiv:2502.12110, 2025.Link
[8] Park, J.S., O'Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., & Bernstein, M.S.. "Generative Agents: Interactive Simulacra of Human Behavior." UIST 2023, 2023.Link
#ai-memory#position-paper#relational-ai#architecture#identity-layer

Implement These Findings

Promitheus implements the memory architecture described in this research. Start building AI that truly remembers.