Nexus Academy

Agent Engineering Fundamentals

Learn the core concepts: tool use, memory, planning, and the agent lifecycle. The foundation course for every Nexus developer.

Prerequisites

Familiarity with Git and a programming language (TypeScript, Python, or Go)
Nexus CLI installed (see Step 1 in Getting Started)
A Nexus account (free tier available)

Learning outcomes

Understand the core components of agent systems
Configure tools, memory, and guardrails in nexus.yaml
Deploy and interact with agents via CLI and SDK
Design safe, reliable agent workflows for real tasks
Evaluate agent performance and troubleshoot failures

Course modules

Module 1

Introduction to Agent Systems

What makes an AI agent different from a language model. Core concepts: autonomy, tool use, planning, and memory. The agent lifecycle from task assignment to completion.

Duration: 45 min

Module 2

Tool-Use Architecture

How agents interact with tools. Tool schemas, structured inputs and outputs, error handling, and tool selection strategies. Building your first custom tool.

Duration: 60 min

Module 3

Planning and Reasoning

Multi-step planning, subgoal decomposition, error recovery, and plan validation. Understanding chain-of-thought and tool-augmented reasoning.

Duration: 60 min

Module 4

Memory and Context

How agents remember and forget. Episodic memory, working memory, and long-term consolidation. Configuring memory for your use case.

Duration: 45 min

Module 5

Agent Safety and Guardrails

Setting boundaries, review gates, and approval workflows. Safe tool configuration, path restrictions, and capability-based security.

Duration: 45 min

Module 6

Capstone: Build an Agent

Apply everything you have learned. Design, configure, deploy, and evaluate an agent for a real engineering task.

Duration: 90 min

Module 1: Introduction to Agent Systems (Detailed)

An AI agent is an autonomous system that uses a language model as its reasoning core, augmented with tools, memory, and planning capabilities to accomplish complex tasks. Unlike a raw language model that only generates text, an agent can: execute code, read and write files, search the web, interact with APIs, and persist knowledge across sessions. The fundamental difference is agency: the ability to act on the world rather than just producing text.

The agent lifecycle. Every task follows the same lifecycle. Perception: the agent receives a task description and collects relevant context from tools and memory. Reasoning: the model processes this context and generates a plan with specific steps. Action: the agent executes each step using tools, observing the results. Memory: outcomes are stored for future reference.

Key concepts. Autonomy: agents operate without step-by-step human guidance. Tool use: agents interact with external systems through well-defined APIs. Planning: agents decompose complex goals into manageable steps. Memory: agents retain and apply knowledge across sessions. These four capabilities define the modern agent paradigm.

Module 2: Tool-Use Architecture (Detailed)

Tools are the bridge between an agent'''s reasoning and the external world. Each tool has a typed schema defining its inputs and outputs. The agent selects which tool to call based on the task context and the tool'''s documentation.

Tool schema example (TypeScript):

{
  name: "search_files",
  description: "Search for files matching a pattern",
  parameters: {
    pattern: { type: "string", description: "Glob pattern" },
    root_dir: { type: "string", optional: true },
    max_results: { type: "number", default: 50 }
  },
  returns: {
    files: { type: "array", items: { type: "string" } }
  }
}

Tool selection strategy. The agent uses semantic matching between the task description and tool documentation to select the right tool. If the task is "find all TODO comments", the agent matches against search_files (semantic similarity: 0.87) rather than run_tests (similarity: 0.12). Confidence thresholds can be configured: below 0.5, the agent asks for clarification; 0.5-0.7, tries the best match with a fallback; above 0.7, proceeds confidently.

Error handling patterns. Tool calls can fail. Implement these strategies: retry with exponential backoff (for transient failures like network timeouts); parameter adjustment (for validation errors, the agent adjusts parameters and retries); fallback tool (if the primary tool fails, the agent tries a semantically similar alternative); human escalation (after N failures, the agent asks for help).

Module 3: Planning and Reasoning (Detailed)

Planning is how agents decompose complex tasks into manageable steps. Nexus agents use a recursive planning approach: the top-level plan defines major phases; each phase is decomposed into sub-steps; sub-steps may be further decomposed if needed. The planning depth is adaptive ? the agent decides how much decomposition is needed based on task complexity.

Chain-of-thought reasoning. Agents show their work through structured reasoning traces. Each reasoning step includes: the current goal (what the step is trying to achieve), relevant context (information from tools or memory supporting the step), the reasoning (how the agent arrived at its decision), the action taken (tool call or internal computation), and the outcome (what happened after the action). These traces are inspectable via nexus logs --agent my-agent --format detailed.

Plan validation. Before executing a plan, the agent validates: goal alignment (does the plan address the task?), completeness (are all required sub-steps present?), dependency ordering (are steps in the correct order?), resource feasibility (are required tools and permissions available?). If validation fails, the agent revises the plan before execution.

Module 4: Memory and Context (Detailed)

Memory is what separates a disposable agent from a learning one. Nexus agents maintain three tiers of memory. Working memory holds the current task context: files being edited, the task description, recent tool outputs. Working memory is limited by the model'''s context window (128K tokens default). Episodic memory stores complete session traces for the past 7 days/200 sessions. This enables the agent to recall exactly what it did in previous sessions. Consolidated memory is permanent knowledge distilled from repeated patterns across sessions.

Memory configuration exercise. For a project that needs to remember coding conventions across sessions but has limited storage budget, configure: tier: compact (skip episodic, go straight to compressed), retention_days: 180 (longer retention for conventions), and create a "coding-conventions" knowledge base for authoritative rules. Verify with nexus memory --agent my-agent --stats.

Exercises

Exercise 1: Deploy an agent with a single tool (code_search only) and ask it to find all functions that lack TypeScript type annotations in your project. Exercise 2: Create a nexus.yaml with two different tool configurations. Deploy two agents with the same model but different tools. Compare their output for the same task. Exercise 3: Run the same task 5 times. Compare the agent'''s approach across runs. Did it choose the same tools? The same plan structure? What varied?

Module 5: Agent Safety and Guardrails (Detailed)

Safety is a first-class concern in agent systems. Unlike traditional software where actions are deterministic and bounded, agents can take unexpected actions with real-world consequences. Nexus provides three layers of safety: compile-time guardrails (configuration-based restrictions that cannot be overridden), runtime monitoring (continuous evaluation of agent behavior), and post-hoc auditing (immutable logs for compliance).

Guardrail configuration. Path restrictions prevent agents from modifying critical system files. Network restrictions prevent agents from connecting to unauthorized endpoints. Execution restrictions limit resource usage and sandbox code execution. Review gates require human approval before agent actions take effect. Configure these in nexus.yaml as shown in the Agent Configuration tutorial.

Capability-based security. Each agent operates with a capability set that defines exactly what it can do. A code review agent might have read access to the entire codebase but write access only to PR comments. A deployment agent might have execute access to the CI/CD pipeline but no access to source code. Capabilities are enforced at the infrastructure level, not the agent level, meaning they cannot be bypassed even by a compromised agent.

Module 6: Capstone Project (Detailed)

The capstone project combines everything you have learned. Your goal: design, configure, deploy, and evaluate an agent for a real engineering task. Recommended project: build an agent that automates dependency updates across a project.

Project requirements. The agent should: scan the project for outdated dependencies, check each dependency for breaking changes, update to the latest compatible version, run the test suite after each update, generate a summary of changes made, and create a pull request with the changes. Configure: tools (package_manager, test_runner, git_ops), memory (persistent, 30-day retention for learning update patterns), guardrails (allowed_paths: src/ and tests/, require_review: true for production projects), and custom instructions (prefer safe version bumps, run full test suite after each change).

Evaluation criteria. Task completion rate (>80%), update accuracy (no breaking changes introduced), test pass rate after updates (>95%), time per update (<10 minutes), and human review acceptance rate (>90%).