Skip to content

Latest commit

 

History

History
476 lines (373 loc) · 17.3 KB

File metadata and controls

476 lines (373 loc) · 17.3 KB

Testing Guide — Giulia

Quick Reference

Recommended: Isolated test environment (docker-compose.test.yml)

Uses a dedicated compose file with its own ArcadeDB instance, fresh volumes, and MIX_ENV=test baked in. No interference with the running dev daemon.

# Run ALL tests (starts ArcadeDB, runs tests, exits)
docker compose -f docker-compose.test.yml run --rm giulia-test

# Run a single test file
docker compose -f docker-compose.test.yml run --rm giulia-test test/giulia/inference/state_test.exs

# Run a specific test by line number
docker compose -f docker-compose.test.yml run --rm giulia-test test/giulia/prompt/builder_test.exs:124

# Run with trace (verbose)
docker compose -f docker-compose.test.yml run --rm -e TEST_ARGS="--trace" giulia-test

# Run only previously failed tests
docker compose -f docker-compose.test.yml run --rm -e TEST_ARGS="--failed" giulia-test

# Clean up test containers and volumes when done
docker compose -f docker-compose.test.yml down -v

Alternative: exec into dev container (quick but less isolated)

Runs tests inside the running dev daemon. Faster (no startup), but CubDB state and ETS tables from the running daemon can cause flaky failures.

The daemon must be running for this approach. Start it first with docker compose up -d.

Replace /projects/Giulia below with the container-side path where your source is mounted (this matches the host path you configured in docker-compose.yml under GIULIA_PROJECTS_PATH).

# Run ALL tests
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test"

# Run a single test file
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test test/giulia/inference/state_test.exs"

# Run a specific test by line number
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test test/giulia/prompt/builder_test.exs:124"

NEVER run mix format on the full project inside Docker

If your host uses CRLF line endings (Windows), running mix format inside the Linux container converts every .ex and .exs file to LF, causing 100+ files to appear modified in git diff and spurious test failures.

If you need to format: run mix format on the host, or format only the specific files you changed — never mix format on the full project from inside the container.

Verifying changes don't break tests

Always follow this workflow when making code changes:

  1. Baseline first: git stash, run full test suite, record exact failure count + which modules
  2. Restore and test: git stash pop, run full test suite again
  3. Compare module-by-module: same modules + same failure count = zero regressions
  4. Never assume: don't claim failures are "pre-existing" without the before/after proof
# Using the isolated test environment (recommended):

# Step 1: Baseline
git stash
docker compose -f docker-compose.test.yml run --rm giulia-test 2>&1 | tail -5

# Step 2: After changes
git stash pop
docker compose -f docker-compose.test.yml run --rm giulia-test 2>&1 | tail -5

# Step 3: Compare (should show same failure modules, same counts)

Why Tests Only Work in Docker

  1. EXLA (ML backend for Nx) doesn't compile on Windows — no precompiled binaries for x86_64-windows
  2. The Docker image has a working Linux EXLA build with CPU target
  3. Tests must run from the live-mounted volume, NOT from /app (baked-in image copy)

Critical Rules

Always use the mounted source path, never /app

The Dockerfile copies source into /app at build time. That's a frozen snapshot. Your live edits are mounted under /projects/ (the path depends on your GIULIA_PROJECTS_PATH setting in docker-compose.yml). For the exec approach, always cd to the mounted path before running tests.

# WRONG — runs stale code from image build time
docker compose exec giulia-worker bash -c "MIX_ENV=test mix test"

# CORRECT — runs live code from host mount
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test"

The isolated test environment (docker-compose.test.yml) handles this automatically via the test-entrypoint.sh script.

Always set MIX_ENV=test (exec approach only)

Without it, the app tries to start Bandit on port 4000, which conflicts with the running daemon. The application.ex skips the HTTP endpoint when MIX_ENV=test:

children =
  if Mix.env() == :test do
    base_children  # no Bandit
  else
    base_children ++ [{Bandit, plug: Giulia.Daemon.Endpoint, port: port}]
  end

The isolated test environment sets MIX_ENV=test automatically.

Test File Conventions

  • Test files live under test/giulia/ mirroring the lib/giulia/ structure
  • File naming: lib/giulia/foo/bar.extest/giulia/foo/bar_test.exs
  • Adversarial tests: test/giulia/foo/bar_adversarial_test.exs
  • Integration tests: test/integration/ (full HTTP stack via Plug.Test)
  • All tests use use ExUnit.Case, async: true unless they need shared state (GenServers, ETS)
  • Use Code.ensure_loaded!(Module) before function_exported?/3 checks — aliases don't trigger module loading

Current Coverage (Build 138)

84+ test files, 1707 tests

Known Failures

Build 138: 1,707 tests, 0 failures (isolated test environment).

All previously known failures (ArcadeDB setup, CubDB corruption, ETS state pollution, private function tests) were fixed in Build 138. If you see failures, they are real regressions — investigate before merging.

Test Categories

Category Files Tests Description
Unit tests 68 ~1111 Direct module testing
Adversarial tests 15 ~371 Malformed input, edge cases, security
Integration tests 1 52 Full HTTP stack via Plug.Test

Modules WITH Tests (68 unique modules covered)

Core

  • core/context_manager.excontext_manager_test.exs
  • core/path_mapper.expath_mapper_test.exs
  • core/path_sandbox.expath_sandbox_test.exs + path_sandbox_adversarial_test.exs

AST

  • ast/analysis.exanalysis_test.exs
  • ast/extraction.exextraction_test.exs
  • ast/patcher.expatcher_test.exs
  • ast/processor.exprocessor_test.exs + pathological_test.exs
  • ast/slicer.exslicer_test.exs

Context

  • context/builder.exbuilder_test.exs
  • context/indexer.exindexer_test.exs + indexer_race_test.exs
  • context/store.exstore_test.exs + store_concurrency_test.exs
  • context/store/formatter.exstore/formatter_test.exs
  • context/store/query.exstore/query_test.exs

Inference

  • inference/approval.exapproval_test.exs
  • inference/bulk_replace.exbulk_replace_test.exs
  • inference/context_builder.excontext_builder_test.exs
  • inference/context_builder/helpers.excontext_builder/helpers_test.exs
  • inference/context_builder/intervention.excontext_builder/intervention_test.exs
  • inference/context_builder/messages.excontext_builder/messages_test.exs
  • inference/context_builder/preview.excontext_builder/preview_test.exs
  • inference/engine.exengine_test.exs
  • inference/engine/helpers.exengine/helpers_test.exs
  • inference/engine/step.exengine/step_test.exs
  • inference/escalation.exescalation_test.exs
  • inference/events.exevents_test.exs
  • inference/orchestrator.exorchestrator_test.exs
  • inference/pool.expool_test.exs
  • inference/rename_mfa.exrename_mfa_test.exs
  • inference/response_parser.exresponse_parser_test.exs
  • inference/state.exstate_test.exs
  • inference/state/counters.exstate/counters_test.exs
  • inference/state/tracking.exstate/tracking_test.exs
  • inference/tool_dispatch.extool_dispatch_test.exs
  • inference/tool_dispatch/guards.extool_dispatch/guards_test.exs
  • inference/trace.extrace_test.exs
  • inference/transaction.extransaction_test.exs
  • inference/verification.exverification_test.exs

Intelligence

  • intelligence/architect_brief.exarchitect_brief_test.exs
  • intelligence/plan_validator.explan_validator_test.exs
  • intelligence/preflight.expreflight_test.exs

Knowledge

  • knowledge/analyzer.exanalyzer_test.exs
  • knowledge/builder.exbuilder_test.exs + builder_adversarial_test.exs
  • knowledge/insights.exinsights_test.exs
  • knowledge/macro_map.exmacro_map_test.exs
  • knowledge/metrics.exmetrics_test.exs
  • knowledge/store.exstore_test.exs + partial_graph_test.exs
  • knowledge/topology.extopology_test.exs

Monitor

  • monitor/store.exstore_test.exs + store_adversarial_test.exs
  • monitor/telemetry.extelemetry_adversarial_test.exs

Persistence

  • persistence/loader.exloader_test.exs + loader_adversarial_test.exs
  • persistence/merkle.exmerkle_test.exs + merkle_adversarial_test.exs
  • persistence/store.exstore_test.exs + store_adversarial_test.exs
  • persistence/writer.exwriter_test.exs + writer_adversarial_test.exs

Providers (via adversarial test)

  • provider/anthropic.exresponse_adversarial_test.exs
  • provider/gemini.exresponse_adversarial_test.exs
  • provider/groq.exresponse_adversarial_test.exs
  • provider/lm_studio.exresponse_adversarial_test.exs
  • provider/router.exrouter_test.exs
  • provider.exprovider_test.exs

Tools

  • tools/commit_changes.excommit_changes_test.exs
  • tools/get_staged_files.exget_staged_files_test.exs
  • tools/read_file.exread_file_test.exs
  • tools/registry.exregistry_test.exs
  • tools/respond.exrespond_test.exs
  • tools/think.exthink_test.exs
  • All 22 tool modules — tools_contract_test.exs (name, description, parameters)

Other

  • structured_output.exstructured_output_test.exs + structured_output_adversarial_test.exs
  • structured_output/parser.exparser_test.exs + parser_adversarial_test.exs
  • prompt/builder.exbuilder_test.exs
  • utils/diff.exdiff_test.exs
  • version.exversion_test.exs
  • daemon/helpers.exhelpers_test.exs

Modules Covered by Integration Tests Only (10)

These modules have no dedicated unit test file but are exercised end-to-end by test/integration/api_adversarial_test.exs (52 tests via Plug.Test):

  • daemon/endpoint.ex — all core routes (health, command, ping, status, projects)
  • daemon/routers/index.ex — 6 routes tested
  • daemon/routers/knowledge.ex — 11 routes tested
  • daemon/routers/monitor.ex — 3 routes tested
  • daemon/routers/discovery.ex — 3 routes tested
  • daemon/skill_router.ex — exercised via all sub-routers
  • inference/engine/response.ex — via POST /api/command
  • inference/tool_dispatch/executor.ex — via tool execution pipeline
  • inference/tool_dispatch/staging.ex — via /api/transaction
  • inference/tool_dispatch/approval.ex — via /api/approval

Modules WITHOUT Tests (40)

Grouped by reason:

Not Testable Without External Dependencies (13)

These require live LLM connections, external services, or multi-node setups:

  • client.ex — HTTP client entry point (needs running daemon)
  • client/approval.ex — approval UI (needs daemon + LLM)
  • client/commands.ex — CLI command dispatch (needs daemon)
  • client/daemon.ex — daemon lifecycle management
  • client/http.ex — HTTP request helpers (needs daemon)
  • client/output.ex — terminal output formatting
  • client/renderer.ex — Owl TUI rendering (visual)
  • client/repl.ex — interactive REPL loop
  • provider/ollama.ex — Ollama API (needs running Ollama)
  • intelligence/embedding_serving.ex — EXLA model loading (tested indirectly)
  • intelligence/semantic_index.ex — embedding search (needs EmbeddingServing)
  • runtime/collector.ex — BEAM runtime data (needs live processes)
  • runtime/inspector.ex — BEAM introspection (needs live processes)

Thin Dispatch / Orchestration Layers (8)

These are mostly glue code — the modules they call ARE tested:

  • application.ex — OTP supervision tree (startup only)
  • inference/engine/commit.ex — commit dispatch (calls tested modules)
  • inference/engine/startup.ex — engine init (calls tested modules)
  • inference/supervisor.ex — supervisor spec (OTP boilerplate)
  • inference/tool_dispatch/special.ex — bulk ops dispatch
  • intelligence/surgical_briefing.ex — briefing composition
  • knowledge/behaviours.ex — behaviour analysis
  • knowledge/store/reader.ex — store read delegation

Sub-Routers (4)

Partially covered by integration tests, remaining routes untested:

  • daemon/routers/approval.ex — 2 routes
  • daemon/routers/intelligence.ex — 4 routes
  • daemon/routers/runtime.ex — 8 routes
  • daemon/routers/search.ex — 3 routes
  • daemon/routers/transaction.ex — 3 routes

Tool Modules Without Dedicated Tests (15)

All have contract tests (name/description/parameters) via tools_contract_test.exs. Missing execution tests for:

  • tools/bulk_replace.ex
  • tools/cycle_check.ex
  • tools/edit_file.ex
  • tools/get_context.ex
  • tools/get_function.ex
  • tools/get_impact_map.ex
  • tools/get_module_info.ex
  • tools/list_files.ex
  • tools/lookup_function.ex
  • tools/patch_function.ex
  • tools/rename_mfa.ex
  • tools/run_mix.ex
  • tools/run_tests.ex
  • tools/search_code.ex
  • tools/search_meaning.ex
  • tools/trace_path.ex
  • tools/write_file.ex
  • tools/write_function.ex

Integration Test Procedure

Integration tests at test/integration/api_adversarial_test.exs use Plug.Test to simulate HTTP requests without a TCP connection. They exercise the full stack:

HTTP Request → Plug.Router → GenServer → ETS → Business Logic → JSON Response

How it works

use Plug.Test
alias Giulia.Daemon.Endpoint

@opts Endpoint.init([])
@project_path "/projects/Giulia"  # matches the volume mount in docker-compose.yml

# Simulate GET with query params
defp get(path, query_params \\ %{}) do
  query = URI.encode_query(query_params)
  full_path = if query == "", do: path, else: "#{path}?#{query}"
  :get |> conn(full_path) |> Endpoint.call(@opts)
end

# Simulate POST with JSON body
defp post(path, body) do
  :post
  |> conn(path, Jason.encode!(body))
  |> put_req_header("content-type", "application/json")
  |> Endpoint.call(@opts)
end

What it covers

The 6 dispatcher modules that can't be unit-tested without mocking their downstream dependencies. Instead, we test the real pipeline end-to-end:

Module Tested Via What's Validated
Engine.Response POST /api/command Unknown command → error JSON
ToolDispatch.Executor POST /api/command Tool pipeline doesn't crash
ToolDispatch.Staging POST /api/transaction Transaction lifecycle
ToolDispatch.Approval POST /api/approval Approval flow
Engine.Commit POST /api/command Staged file commit
ToolDispatch.Special POST /api/command Bulk operation dispatch

Key insight

In a test environment, the property graph may be empty (no project scanned). Tests for module-specific queries (centrality, dependents, etc.) accept both 200 (graph populated) and 404 (module not found) — they validate the HTTP layer, not the business logic.

Writing New Tests

Template for pure function modules

defmodule Giulia.Foo.BarTest do
  use ExUnit.Case, async: true

  alias Giulia.Foo.Bar

  describe "function_name/arity" do
    test "handles normal input" do
      assert Bar.function_name(input) == expected
    end

    test "handles edge case" do
      assert Bar.function_name(nil) == {:error, :invalid}
    end
  end
end

Template for GenServer modules

defmodule Giulia.Foo.ServerTest do
  use ExUnit.Case, async: false  # shared state

  setup do
    # Start isolated instance if possible
    {:ok, pid} = Giulia.Foo.Server.start_link(name: nil)
    %{pid: pid}
  end

  test "does something", %{pid: pid} do
    assert GenServer.call(pid, :something) == :ok
  end
end

Template for Tool modules

defmodule Giulia.Tools.MyToolTest do
  use ExUnit.Case, async: true

  alias Giulia.Tools.MyTool

  test "name/0 returns tool name" do
    assert MyTool.name() == "my_tool"
  end

  test "description/0 is non-empty" do
    assert is_binary(MyTool.description())
    assert String.length(MyTool.description()) > 0
  end

  test "parameters/0 returns valid schema" do
    params = MyTool.parameters()
    assert is_map(params)
    assert Map.has_key?(params, "type")
  end
end

Template for adversarial tests

defmodule Giulia.Foo.BarAdversarialTest do
  @moduledoc """
  Adversarial tests for Foo.Bar.

  Targets: nil inputs, huge inputs, malformed data, type confusion,
  concurrent access, boundary values.
  """
  use ExUnit.Case, async: true  # or false if shared state

  alias Giulia.Foo.Bar

  describe "function_name with malformed input" do
    test "nil input" do
      # Should not crash
      result = Bar.function_name(nil)
      assert result == {:error, _} or result == :default
    end

    test "extremely large input" do
      big = String.duplicate("x", 100_000)
      result = Bar.function_name(big)
      assert is_binary(result) or match?({:error, _}, result)
    end
  end
end