Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions .agent/knowledge/_shared/ai-knowledge-base-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# AI Knowledge Base Architecture Guide

How to set up, maintain, and extend the neops AI knowledge base for effective AI-assisted development.

## Architecture Overview

The knowledge base follows Anthropic's context engineering principles: token efficiency, progressive disclosure, and graceful degradation. Every file must justify its existence by providing high-signal context that agents cannot infer from code alone.

### File Structure (Per Repository)

```
AGENTS.md # Universal AI context (~100 lines, always loaded)
CLAUDE.md # Claude Code pointer to AGENTS.md
.cursor/rules/ # Cursor-specific rules with glob matching
project-context.mdc # Always-on project context
documentation-writing.mdc # Triggered when editing docs/**
.agent/knowledge/
_shared/ # Shared across all neops repos (future: neops-ai-context submodule)
neops-ecosystem-overview.md # Platform overview, components, data flow
component-architectures.md # All component architectures in one file
documentation-playbook.md # How to write docs for any neops component
documentation-personas.md # Review persona definitions
cross-project-patterns.md # Cross-repo conventions, testing patterns
ai-knowledge-base-guide.md # This file
<project-specific files> # Unique to each repo (audits, link tracking, etc.)
```

### Design Principles

1. **Progressive disclosure**: AGENTS.md gives enough context for any agent to start working. Deeper knowledge is discovered on-demand when agents explore `.agent/knowledge/`.
2. **Token efficiency**: ~570 lines across 6 shared files (down from 1400+ across 12+ duplicated files). No filler, no redundancy with linter/CI enforcement.
3. **Single source of truth**: `_shared/` files are identical across all repos. Future plan: extract to `neops-ai-context` repo as git submodule at `.agent/shared/`.
4. **Graceful degradation**: If an agent only reads AGENTS.md, it can still function. If `_shared/` isn't available, project-specific files and AGENTS.md provide sufficient context.
5. **Agent-agnostic**: Works with Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, and any tool that reads markdown files from the repo.

### Root File Strategy

| File | Purpose | Loaded by |
|---|---|---|
| `AGENTS.md` | Primary AI context (vendor-neutral, AGENTS.md open standard) | Cursor, Copilot, Codex, Gemini CLI, Claude Code |
| `CLAUDE.md` | Pointer to AGENTS.md + Claude-specific notes | Claude Code |
| `.cursor/rules/*.mdc` | Glob-matched rules (e.g., docs/** triggers writing conventions) | Cursor only |

## Maintenance Guidelines

### When to Update

- **After implementing a new feature**: update component-architectures.md and AGENTS.md
- **After writing documentation**: update project-specific audit files
- **After discovering implementation gaps**: update ecosystem overview's status section
- **After establishing new conventions**: update documentation-playbook.md or cross-project-patterns.md
- **After a persona review round**: update documentation-personas.md if review process changed

### How to Keep Shared Files in Sync

Until the `neops-ai-context` repo exists, shared files must be manually kept identical:

1. Edit the file in one repo
2. Copy to the other two repos: `cp neops-workflow-engine/.agent/knowledge/_shared/file.md neops-worker-sdk-py/.agent/knowledge/_shared/file.md`
3. Commit in all repos referencing the same change

Future: `neops-ai-context` repo as submodule at `.agent/shared/` eliminates manual sync.

### What Goes Where

| Content Type | Location |
|---|---|
| Ecosystem-wide knowledge | `_shared/` |
| Component architecture details | `_shared/component-architectures.md` |
| Project-specific doc audit | Project root `.agent/knowledge/` |
| Project-specific link tracking | Project root `.agent/knowledge/` |
| Review findings | Project root `.agent/knowledge/` |

## Bootstrapping a New Neops Component

### Prompt Template

Use this prompt with any AI coding agent to bootstrap knowledge files for a new neops component repository:

---

**Prompt for AI agents:**

> You are setting up AI knowledge files for the `{REPO_NAME}` repository, a component of the neops network automation platform.
>
> **Step 1: Copy shared knowledge**
> Copy all files from an existing neops repo's `.agent/knowledge/_shared/` directory to this repo's `.agent/knowledge/_shared/`. These files contain ecosystem-wide context that must be identical across all repos.
>
> **Step 2: Create AGENTS.md**
> Create an `AGENTS.md` at the repo root following this structure (~100-120 lines):
> - Overview (3-4 sentences about this specific component)
> - Tech Stack (bullet list)
> - Architecture (key concepts and data flow, brief)
> - Development (build, test, lint commands — copy from README or Makefile)
> - Project Structure (key directories with one-line descriptions)
> - Conventions (coding style, naming patterns, import rules)
> - Neops Ecosystem (brief context, ~20 lines, pointer to `.agent/knowledge/_shared/`)
> - AI Agent Guidance (how agents should approach work in this repo)
>
> **Step 3: Create CLAUDE.md**
> Create a `CLAUDE.md` with: "See AGENTS.md for full project context. For deeper knowledge, explore .agent/knowledge/."
>
> **Step 4: Create .cursor/rules/**
> Create `.cursor/rules/project-context.mdc` (alwaysApply: true) with project-specific context.
> If the repo has documentation, create `.cursor/rules/documentation-writing.mdc` (globs: docs/**) with writing conventions.
>
> **Step 5: Create project-specific knowledge**
> Explore the codebase and create project-specific knowledge files in `.agent/knowledge/`:
> - `{repo-name}-docs-audit.md` (if docs exist: structure, quality, gaps)
> - `missing-external-links.md` (cross-project links that need resolution)
>
> **Step 6: Verify**
> - Ensure `_shared/` files are byte-identical to other repos (use `diff` to confirm)
> - Verify commands in AGENTS.md Development section actually run successfully
> - Verify Project Structure section matches the filesystem (`ls -la`)
> - Verify configuration table matches actual environment variables in source code
> - Ensure .cursor/rules/ glob patterns match actual directory names (e.g., `docs/**` not `documentation/**`)
> - Ensure .cursor/rules/ don't duplicate content already in AGENTS.md
>
> **Fallback for Step 1**: If no other neops repo is available locally, clone any neops component repo
> and copy its `_shared/` directory. The canonical files are kept in sync across all repos.

---

### Checklist for New Component

- [ ] `.agent/knowledge/_shared/` contains all 6 shared files (identical to other repos)
- [ ] `AGENTS.md` exists at repo root with accurate project context
- [ ] `CLAUDE.md` exists at repo root pointing to AGENTS.md
- [ ] `.cursor/rules/project-context.mdc` exists with project-specific context
- [ ] `.cursor/rules/documentation-writing.mdc` exists if repo has docs
- [ ] Project-specific knowledge files created in `.agent/knowledge/`
- [ ] `.gitignore` does NOT exclude `.cursor/` (rules should be committed)

## Future: neops-ai-context Repository

Planned: extract `_shared/` to a dedicated `neops-ai-context` git submodule at `.agent/shared/` in each repo, eliminating manual sync. The current structure is designed for trivial extraction.
89 changes: 89 additions & 0 deletions .agent/knowledge/_shared/component-architectures.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Neops Component Architectures

Consolidated architecture reference for all neops platform components.

## Workflow Engine (NestJS/TypeScript)

### Workflow Definition (YAML)

Top-level fields: `label`, `package`, `name`, `majorVersion`/`minorVersion`/`patchVersion`, `seedEntity` (device|interface|group|global), `description`, `parameterSchema`, `acquire[]`, `type: workflow`, `steps[]`.

Step types: `functionBlock` (execute registered FB), `workflow` (inline nested), `workflowReference` (reference another definition — not yet implemented).

Parameters use `{{ jmesPath }}` interpolation against the execution context. Conditions (`condition.jmes`) skip steps; assertions (`assert.jmes`) fail execution.

### Blackboard & Job Lifecycle

`PENDING` → `POLLED` (assigned to worker) → `PUSHED` (result received). Job types: `ACQUIRE`, `EXECUTE`, `ROLLBACK`.

Worker API: `/workers/register` (POST), `/workers/:uuid/ping` (POST heartbeat), `/function-blocks/register` (POST), `/blackboard/job` (POST poll), `/blackboard/job/result` (POST push).

Worker states: ONLINE → UNREACHABLE (2min no ping) → OFFLINE (6min) → deleted (24h). Stuck jobs (>12min POLLED) auto-failed.

### Pure/Idempotent Semantics

Engine tracks `isPureExecution` and `isIdempotentExecution` across steps. Failed workflow with only pure steps → `FAILED_SAFE`. Non-pure failure → `FAILED_UNSAFE`. Auto-retry for pure/idempotent is planned but not implemented; retry count is hardcoded to 3 in `job-executor.ts`.

### Configuration

Database: PostgreSQL (host port 5434 default, container 5432). CMS: GraphQL endpoint. Port: 3030. Schema: `GET /schema`. Swagger: `/api`. Health: `/health/`. Local dev: `docker-compose.yml` at repo root (engine + postgres + monitor app). Build override: `docker-compose.build.yml` for local source builds.

## Worker SDK (Python 3.12+)

### Function Block System

```python
class FunctionBlock(Generic[ParamsT, ResultDataT], ABC):
async def run(params, context) -> FunctionBlockResult[ResultDataT]
async def acquire(params) -> FunctionBlockAcquireResult
async def rollback(params, context, result_from_failed) -> FunctionBlockRollbackResult
```

Registration via `@register_function_block(Registration(name, package, version, run_on, fb_type, is_pure, is_idempotent))`. ParamsT: Pydantic model (`extra="ignore"`). ResultDataT: Pydantic model (`extra="forbid"`).

### Worker Architecture

Hybrid sync/async: main loop (async) handles heartbeat/polling/API; FBs execute sync in `ThreadPoolExecutor(max_workers=1)`. Sequential job processing. Blocking detector warns on sync calls in async loop.

Config: `URL_BLACKBOARD`, `DIR_FUNCTION_BLOCKS`, `WORKER_NAME`. Entry point: `neops_worker`.

### Connection System (Three Layers)

1. **Capability interfaces**: abstract method contracts (e.g., `DeviceInfoCapability.get_version()`)
2. **ConnectionProxy**: user-facing API, composes capabilities via inheritance, delegates to plugin at runtime
3. **ConnectionPlugin**: platform/library implementations. Resolution: platform → connection_type → library → capabilities

Base plugins: Scrapli, Netmiko, Napalm, NETCONF, RESTCONF, API. ProxyMeta metaclass generates fallback methods raising `NotImplementedForThisPlatform`.

### Data & Context

WorkflowContext holds entity state (devices, groups, interfaces). Change tracking via deep-copy snapshot at init; `compute_db_updates()` diffs current vs. snapshot to generate `EntityCreateDto`/`EntityPatchDto`/`EntityDeleteDto`.

## CMS (Django/GraphQL)

### Data Models

- **Device**: hostname, ip, username, password (encrypted), platform (FK), groups (M2M), connection_state (NEW|UNREACHABLE|NOSSH|AUTHFAILURE|OK), facts/checks (JSON auto-aggregated), soft-deletable, lockable
- **Interface**: name, ifindex, device (FK CASCADE), state (UP|DOWN|ADMIN_SHUTDOWN|ERROR_DISABLED), neighbor (self one-to-one), facts/checks
- **DeviceGroup**: name (unique), title, devices (M2M), facts/checks
- **Facts/Checks**: versioned records (key, value JSON, valid_till, purge_at), auto-aggregated into parent entity

### Integration Pattern (Acquire → Execute → Unlock)

1. Engine calls `getAndLockResources` GraphQL mutation → CMS locks entities, resolves Elasticsearch queries
2. Locked entities serialized as DTOs → passed to workers as job context
3. Workers modify entities in memory → compute diff
4. Diff sent as `dbUpdates` in job result → engine aggregates
5. Engine calls `unlockResources` with aggregated updates → CMS applies atomically

Authentication: JWT with RS256, JWKS at `/.well-known/jwks.json`, role-based permissions (BitField).

## Remote Lab (FastAPI)

Session-based lab allocation: `POST /session` → wait in FIFO queue → `ACTIVE` → use lab → `DELETE /session`. Heartbeat required (300s timeout). One lab at a time per session.

Lab lifecycle: upload topology (`POST /lab`), topology hash comparison for reuse, reference counting for shared labs, release/destroy.

Worker SDK integration via pytest fixtures: `remote_lab_fixture("tests/topologies/simple_iol.yml")`. Available topologies: `simple_iol` (2 Cisco IOL), `simple_frr` (2 FRRouting). `RemoteLabDevice.to_neops_device()` converts lab devices to `DeviceTypeDto`.

Config: `REMOTE_LAB_URL` (unset = local mode), `REMOTE_LAB_REQUEST_TIMEOUT` (30s), `REMOTE_LAB_SESSION_TIMEOUT` (300s).
70 changes: 70 additions & 0 deletions .agent/knowledge/_shared/cross-project-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Cross-Project Patterns

Conventions for documentation, examples, and testing that span multiple neops repositories.

## Cross-Project URL Convention

Links between project docs use absolute paths rooted at the project directory:
- Engine → SDK: `/neops-worker-sdk-py/docs/...`
- SDK → Engine: `/neops-workflow-engine/docs/...`

These resolve in the unified MkDocs multi-repo build (the `neops/` mono-repo includes all component repos as submodules under `docs/`).

## Terminology Alignment

Canonical definitions for shared terms live in `neops-ecosystem-overview.md` (Key Concepts table). When adding a new term to one project's glossary, add it to all relevant projects. Terms MUST match across all glossaries.

## Example Alignment

- All getting-started examples use **`fb.examples.neops.io`** package
- Workflow YAML `name`, `package`, and `version` fields must match between engine and SDK examples
- Engine CI: `make validate-examples` validates YAML against JSON schema
- SDK CI: pytest validates FB signatures and test cases
- When updating an example in one repo, check the other repo and update accordingly

### Runnable vs. Illustrative Examples

- **Runnable**: `echo`, `show_version`, `ping`, `configBackup` have real SDK implementations
- **Illustrative**: intermediate/advanced workflow examples use hypothetical FB names to demonstrate patterns
- Always label clearly which examples are runnable and which are illustrative

## Implementation Status Sync

Both projects must agree on implementation status for shared features:
- If the engine marks a feature as unimplemented, SDK docs must not describe it as available
- Use identical admonition style: `!!! warning "Implementation Status"`
- Periodically audit cross-project status to catch drift

## Cross-Project Onboarding

Each project's Getting Started links to the other:
- Engine "Your First Workflow" → SDK "Write Your First Function Block"
- SDK "Your First Function Block" → Engine "Run Your First Workflow"

This creates a complete onboarding loop regardless of entry point.

## Testing Patterns

### Worker SDK Testing

Two test decorators:
- `@fb_test_case(description, params, context, succeeds, assertions)` — local tests with mocked context
- `@fb_test_case_with_lab(description, params, remote_lab_fixture, assertions)` — remote lab with real devices

Context factory: `create_workflow_context(run_on, entity_id, devices, device_groups, interfaces)`.

Available lab topologies: `simple_iol` (2 Cisco IOL), `simple_frr` (2 FRRouting).

### Workflow Engine Testing

- Unit tests: Jest (`npm run test`)
- E2E tests: Supertest + PostgreSQL (`npm run test:e2e`)
- Example validation: `make validate-examples` (JSON schema validation of all YAML examples)
- CI in Docker: multi-stage Dockerfile with `--target run-ci`

### Common Pitfalls

- Hyphenated directories (`examples/getting-started/`) can't be imported as Python packages — use `sys.path` manipulation
- `DeviceTypeDto.platform` must be a `PlatformTypeDto`, not a string
- Plugin imports register via decorators at import time — order matters
- Default pytest config excludes `function_block` marker; use `-m "function_block"` explicitly
37 changes: 37 additions & 0 deletions .agent/knowledge/_shared/documentation-personas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Documentation Personas

Reusable persona definitions for reviewing and writing neops documentation.

## Sam — Junior Network Engineer

- **Experience**: 2 years in network operations
- **Skills**: Python basics, YAML from Ansible playbooks, CLI comfort. No typed Python (Pydantic, dataclasses) or TypeScript.
- **Mindset**: Eager to learn, needs hand-holding, appreciates fun and approachable tools
- **Reading pattern**: Getting Started → Concepts → Workflows. Skips architecture docs initially.
- **Success criteria**: Can run a hello-world workflow end-to-end within 30 minutes following only the docs

## Priya — Senior Network Engineer

- **Experience**: 15+ years across multi-vendor environments
- **Skills**: Expert in Ansible, Nornir, custom Python tooling, NETCONF/YANG. Familiar with CI/CD.
- **Mindset**: Critical, pragmatic, demands clear ROI before adopting a new tool
- **Reading pattern**: Architecture and Concepts first, then advanced features (acquire, retry, rollback). Compares with existing tools.
- **Success criteria**: Understands why neops is better than Ansible/Nornir for transaction-safe multi-device automation

## Marcus — Implementation Wizard

- **Experience**: Staff-level engineer, modern Python and TypeScript fluency
- **Skills**: Pydantic, NestJS, gRPC, MikroORM. Reads source when docs fall short.
- **Mindset**: Demands precision, completeness, and internal consistency. Notices mismatched types, missing edge cases.
- **Reading pattern**: Source-level docs, extension points, schema references first. Tutorials only if they show non-obvious patterns.
- **Success criteria**: Can extend neops with a custom handler, gateway, or FB type without asking questions

## Diana — Technical Writer (Meta-Reviewer)

- **Experience**: Multi-product technical documentation across developer platforms
- **Skills**: Evaluates structure, maintainability, audience awareness, cross-project consistency
- **Mindset**: Documentation as product. Cares about navigation, progressive disclosure, long-term maintainability.
- **Reading pattern**: Full nav structure review, then spot-checks for consistency, voice, completeness.
- **Success criteria**: Docs are navigable, internally consistent, each page serves a clear audience

For the persona review process, see `documentation-playbook.md` (QA phase).
Loading