← Back to Learn

Module 1: Multi-Agent Architecture (Detailed)

Multi-agent systems distribute work across specialized agents, each with its own tools, memory, and configuration. Three architectural patterns dominate production deployments. Orchestrated: A central coordinator agent decomposes tasks and assigns them to worker agents. Best for hierarchical workflows with clear task boundaries. Trade-off: single point of failure, limited emergent behavior. Shared memory: Agents coordinate through a shared graph structure (Shared Memory Graph). Best for collaborative knowledge work where agents build on each other insights. Trade-off: requires convergence time for the shared graph to stabilize. Message-passing: Agents communicate directly through typed messages. Best for loosely coupled systems with independent agents. Trade-off: complex protocol design, O(n^2) communication overhead.

Choosing an architecture. Consider: task structure (hierarchical ? orchestrated, collaborative ? shared memory, independent ? message-passing), team size (2-4 agents: any pattern works; 5-20 agents: shared memory scales best; 20+: hybrid patterns needed), failure tolerance (mission-critical: shared memory with redundancy; best-effort: orchestrated is simpler), and emergence requirement (if you want agents to discover new coordination patterns, shared memory is the only option).

Module 2: Shared Memory Graphs (Detailed)

Shared Memory Graphs (SMG) are the default coordination mechanism for multi-agent Nexus deployments. The graph is a directed attributed graph where nodes represent tasks, knowledge, and artifacts, and edges represent relationships.

Node types in detail. Task nodes store: task description, current status (pending, in_progress, blocked, completed), assigned agent, priority, dependencies, and execution history. Knowledge nodes store: factual statements, source agent, confidence score (0-1), creation timestamp, last verified timestamp, and evidence links. Artifact nodes store: file paths or object references, format metadata, owning agent, and version history.

Edge semantics. Delegation (agent A delegates subtask to agent B): enables work distribution without a central scheduler. Dependency (task X depends on task Y): enables automatic workflow sequencing. Informed-by (agent used knowledge K): enables provenance tracking. Conflict (knowledge K1 contradicts K2): enables automatic detection of disagreements. Consensus (agents agree on knowledge K): enables convergence to shared truth.

Module 3: Coordination Protocols (Detailed)

Beyond the primitive graph operations, teams of agents develop coordination protocols. Task discovery: When an agent creates a task node, other agents can discover it through graph traversal. An agent looking for work subscribes to new task nodes in domains matching its skills. Work stealing: If an agent completes its assigned tasks and sees pending tasks in another agent queue, it can autonomously pick up work by adding a delegation edge to itself. This balances workload without central scheduling. Knowledge cross-pollination: When an agent learns a new pattern, the knowledge node propagates through the graph. Other agents working on similar tasks discover it through informed-by edges and can apply it without explicit teaching. In our production data, knowledge cross-pollination reduces duplicate learning by 67%.

Handling coordination failures. Common failure modes: deadlock (two agents each wait for the other) ? detected by graph cycle analysis, resolved by timeout-based escalation; knowledge staleness (agent uses outdated knowledge) ? confidence scores decay over time, triggering re-verification; task abandonment (agent fails and leaves tasks incomplete) ? heartbeat-based failure detection reassigns tasks after timeout.

Exercises

Exercise 1: Deploy two agents sharing a knowledge base. Give them related but independent tasks. Check if knowledge learned by agent 1 appears in agent 2 memory. Exercise 2: Create a three-agent team with one coordinator and two workers using orchestrated architecture. Measure task completion time vs a single-agent baseline. Exercise 3: Introduce conflicting knowledge (agent 1 stores "port is 3000", agent 2 stores "port is 4000"). Observe the conflict resolution process. How long does convergence take?

Nexus Academy

Multi-Agent Systems

Design patterns for coordinating multiple agents: delegation, shared memory, and conflict resolution.

Prerequisites

Course: Agent Engineering Fundamentals
Experience deploying single-agent systems
Understanding of distributed systems concepts

Learning outcomes

Design multi-agent system architectures
Implement shared memory and delegation patterns
Handle conflicts and disagreements between agents
Deploy and operate multi-agent teams in production
Evaluate the costs and benefits of multi-agent approaches

Course modules

Module 1

Why Multi-Agent?

Limitations of single-agent systems. When to use multiple agents: parallelization, specialization, fault tolerance. Coordination vs. orchestration.

Duration: 45 min

Module 2

Shared Memory and Context

Shared Memory Graphs for multi-agent coordination. Knowledge sharing, task dependencies, and information flow between agents.

Duration: 60 min

Module 3

Delegation Patterns

Task decomposition across agents. Skill-based routing, hierarchical delegation, and peer-to-peer collaboration patterns.

Duration: 45 min

Module 4

Conflict Resolution

Detecting and resolving conflicts between agents. Consensus mechanisms, source-based priority, and human escalation.

Duration: 45 min

Module 5

Production Multi-Agent Systems

Designing, deploying, and operating multi-agent teams at scale. Monitoring inter-agent communication, resource allocation, and team composition.

Duration: 60 min

Module 4: Scaling Multi-Agent Systems (Detailed)

As agent teams grow beyond 5-10 agents, coordination overhead becomes the primary bottleneck. Three scaling strategies: Graph partitioning ? divide the shared memory graph into sub-graphs by domain or team. Agents within a sub-graph share context freely; cross-sub-graph communication goes through gateway agents. This reduces graph size per agent from O(N) to O(N/K) where K is the number of partitions. Hierarchical coordination ? use a two-level graph: team-level graphs for intra-team coordination and an organization-level graph for cross-team knowledge sharing. The organization graph only contains high-level knowledge and artifact references. Asynchronous reconciliation ? agents work independently and reconcile their memory graphs periodically (every 15 minutes) rather than synchronously. This reduces coordination overhead by 60% at the cost of temporary inconsistency.

Performance characteristics. In our production deployment of 50+ agents across 4 teams: coordination overhead at 10 agents: 4.7% of total actions; at 30 agents (single graph without partitioning): 8.1%; at 50 agents (with partitioning): 6.2%; at 50 agents (without partitioning): 14.3%. Partitioning becomes cost-effective above 20 agents. Below 20 agents, the operational complexity of partitioning outweighs the overhead savings.

Module 5: Production Deployment and Monitoring (Detailed)

Deploying multi-agent systems to production requires additional considerations beyond single-agent deployment. Agent discovery: Agents need to find each other. Use the Nexus agent registry: nexus agent list --team backend lists all agents in a team. Agents automatically register when deployed. Health monitoring: Each agent exposes a health endpoint. The platform monitors: agent availability (heartbeat every 5 seconds), task throughput (tasks completed per hour), error rate (failed tasks / total tasks), and memory utilization (entries per tier, storage used). Graceful degradation: When an agent fails, its tasks are redistributed. Configure task reassignment policy: immediate (tasks are reassigned as soon as failure is detected), on-completion (in-progress tasks complete before reassignment), or manual (human decides).