Paper • arXiv 2026
Persistent Memory for Agentic Workflows
We introduce a hierarchical memory architecture that achieves 94% retention accuracy across heterogeneous agent sessions. Our approach combines episodic memory buffers with compressed working memory representations, enabling agents to recall and apply knowledge from prior sessions without catastrophic forgetting. Evaluated on 10k+ agent trajectories across code engineering tasks.
Paper • ICML 2026
Tool-Augmented Reasoning at Scale
We show that structured tool-use during chain-of-thought reasoning improves accuracy by 37% on complex software engineering benchmarks. Our method interleaves natural language reasoning with API calls to compilers, linters, test runners, and documentation retrievers within a unified thought loop.
Model • 2026
Nexus-1: Agent Foundation Model
Our flagship foundation model achieves state-of-the-art results on SWE-Bench, Tool-Use, and Multi-Step Reasoning benchmarks. Nexus-1 is trained on 3M+ trajectories of tool-mediated problem solving using a novel two-stage curriculum: supervised fine-tuning on expert demonstrations followed by RL from tool-use feedback.
Whitepaper • 2025
The Case for Agent-Native Infrastructure
Why the next generation of AI requires a fundamentally new runtime. We analyze the limitations of chat-based interfaces layered on legacy systems and propose a set of architectural principles for building infrastructure designed for autonomous agents from the ground up.
Blog • 2026
Evaluating Agent Reliability: A Practical Framework
A systematic methodology for measuring and improving the reliability of autonomous AI agents in production. We introduce coverage metrics, failure mode taxonomies, and a continuous evaluation pipeline that runs against every deployment.
Blog • 2026
Building Agents That Remember: Lessons from Production
Engineering lessons from deploying persistent memory across 10k+ agent sessions in production. We cover memory compaction strategies, retrieval latency, conflict resolution, and the surprising failure modes of long-lived agents.
Workshop • 2026
Multi-Agent Coordination via Shared Memory Graphs
We extend persistent memory to multi-agent settings, showing that shared memory graphs enable teams of agents to coordinate, delegate, and resolve conflicts without centralized orchestration.
Preprint • 2026
Safety-Critical Agent Behavior via Constrained Decoding
A method for enforcing operational constraints during agentic decoding, guaranteeing that generated actions satisfy pre-defined safety policies without requiring post-hoc filtering.