Step 06

Production deployment

Deploy agents at scale: CI/CD integration, monitoring, audit logs, and team management.

CI/CD integration

Nexus agents integrate natively with your CI/CD pipeline. Add a step to your GitHub Actions workflow:

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: nexus/agent-action@v1
        with:
          api-key: ${{ secrets.NEXUS_API_KEY }}
          task: "Review this PR for bugs and style issues"

The agent will analyze the PR diff, run tools, and post comments directly on the pull request.

Monitoring and observability

Every agent action is logged with full causal tracing. View agent metrics in the Nexus dashboard or export them to your observability stack:

nexus metrics --agent my-agent --format json

Key metrics include task completion rate, tool call accuracy, reasoning step count, memory retrieval precision, and latency percentiles. Integrations with Datadog, Grafana, and Sentry are available out of the box.

Audit logs

For compliance and debugging, every agent action is recorded in an immutable audit log. View logs at any time:

nexus logs --agent my-agent --tail 100 --format detailed

Audit logs include the agent's full reasoning trace, all tool inputs and outputs, timestamps, and the human reviewer's decision (if applicable). Logs are retained for 1 year by default.

Team management

Manage multiple agents across your team using the Nexus dashboard or API. Create agent teams, set resource quotas, and configure role-based access control:

nexus team create --name backend-team
nexus team add-agent --team backend-team --agent code-architect
nexus team set-quota --team backend-team --max-concurrent 5

Enterprise features include SSO (SAML/OIDC), SCIM provisioning, and custom audit retention policies.

Security hardening

For production deployments, implement these security measures. Capability-based access control: Define agent capabilities in nexus.yaml using the capabilities section. Restrict each agent to the minimum set of tools and resources needed for its tasks. Network segmentation: Run agents in isolated network namespaces. Use allowed_hosts in guardrails to restrict outbound connections. Default: deny all; explicitly allow required endpoints. Secrets management: Never store API keys, database credentials, or other secrets in nexus.yaml. Use the Nexus secrets vault: nexus secrets set DB_PASSWORD --value "...". Secrets are encrypted at rest and injected as environment variables at agent startup. Audit logging: Enable detailed audit logging for compliance: audit: level: detailed retention_days: 365. This captures all agent actions, tool I/O, and human review decisions in an immutable log.

Cost optimization

Agent costs are driven by model inference, tool execution, and memory storage. Optimize costs with these strategies. Model selection: Use smaller, faster models for simple tasks (e.g., code search, documentation generation) and reserve large models for complex reasoning tasks. Configure per-agent model selection in nexus.yaml. Task batching: Combine related tasks into single agent sessions to avoid repeated model loading overhead. The SDK'''s askBatch() method batches multiple tasks into one session. Memory tier optimization: Use tier: compact for cost-sensitive deployments. This reduces storage costs by ~70% with only ~10% reduction in cross-session learning effectiveness. Idle agent management: Set auto_suspend: 5m to automatically suspend idle agents after 5 minutes of inactivity. Suspended agents consume no inference resources and minimal storage.

Expected monthly costs for reference deployments: Development (1 agent, 8h/day, simple tasks): ~$200/month. Team (5 agents, 24/7, mixed tasks): ~$1,200/month. Enterprise (20 agents, 24/7, complex tasks with full memory): ~$4,500/month. Actual costs vary based on task complexity, model choice, and memory configuration.

Multi-region deployment

For global teams, deploy agents in multiple regions for reduced latency and compliance. Nexus supports multi-region deployment with automatic request routing. Configure regions in nexus.yaml:

regions:
  primary: us-west-2
  replicas:
    - eu-west-1
    - ap-southeast-1
routing: latency-based  # or geo-based

The memory layer is replicated across regions with eventual consistency (typical propagation delay: 2-5 seconds). For consistency-sensitive workloads, pin agents to the primary region.

Scaling agents

As your usage grows, Nexus agents scale horizontally. The platform automatically load-balances across available compute resources. For self-hosted deployments, use our Kubernetes operator:

kubectl apply -f https://nexus.run/k8s/operator.yaml

The operator manages agent pods, memory stores, and tool sandboxes across your cluster.