Uses a dedicated compose file with its own ArcadeDB instance, fresh volumes,
and MIX_ENV=test baked in. No interference with the running dev daemon.
# Run ALL tests (starts ArcadeDB, runs tests, exits)
docker compose -f docker-compose.test.yml run --rm giulia-test
# Run a single test file
docker compose -f docker-compose.test.yml run --rm giulia-test test/giulia/inference/state_test.exs
# Run a specific test by line number
docker compose -f docker-compose.test.yml run --rm giulia-test test/giulia/prompt/builder_test.exs:124
# Run with trace (verbose)
docker compose -f docker-compose.test.yml run --rm -e TEST_ARGS="--trace" giulia-test
# Run only previously failed tests
docker compose -f docker-compose.test.yml run --rm -e TEST_ARGS="--failed" giulia-test
# Clean up test containers and volumes when done
docker compose -f docker-compose.test.yml down -vRuns tests inside the running dev daemon. Faster (no startup), but CubDB state and ETS tables from the running daemon can cause flaky failures.
The daemon must be running for this approach. Start it first with docker compose up -d.
Replace /projects/Giulia below with the container-side path where your source
is mounted (this matches the host path you configured in docker-compose.yml
under GIULIA_PROJECTS_PATH).
# Run ALL tests
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test"
# Run a single test file
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test test/giulia/inference/state_test.exs"
# Run a specific test by line number
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test test/giulia/prompt/builder_test.exs:124"If your host uses CRLF line endings (Windows), running mix format inside the
Linux container converts every .ex and .exs file to LF, causing 100+ files
to appear modified in git diff and spurious test failures.
If you need to format: run mix format on the host, or format only the specific
files you changed — never mix format on the full project from inside the container.
Always follow this workflow when making code changes:
- Baseline first:
git stash, run full test suite, record exact failure count + which modules - Restore and test:
git stash pop, run full test suite again - Compare module-by-module: same modules + same failure count = zero regressions
- Never assume: don't claim failures are "pre-existing" without the before/after proof
# Using the isolated test environment (recommended):
# Step 1: Baseline
git stash
docker compose -f docker-compose.test.yml run --rm giulia-test 2>&1 | tail -5
# Step 2: After changes
git stash pop
docker compose -f docker-compose.test.yml run --rm giulia-test 2>&1 | tail -5
# Step 3: Compare (should show same failure modules, same counts)- EXLA (ML backend for Nx) doesn't compile on Windows — no precompiled binaries for
x86_64-windows - The Docker image has a working Linux EXLA build with CPU target
- Tests must run from the live-mounted volume, NOT from
/app(baked-in image copy)
The Dockerfile copies source into /app at build time. That's a frozen snapshot.
Your live edits are mounted under /projects/ (the path depends on your
GIULIA_PROJECTS_PATH setting in docker-compose.yml). For the exec approach,
always cd to the mounted path before running tests.
# WRONG — runs stale code from image build time
docker compose exec giulia-worker bash -c "MIX_ENV=test mix test"
# CORRECT — runs live code from host mount
docker compose exec giulia-worker bash -c "cd /projects/YourProject && MIX_ENV=test mix test"The isolated test environment (docker-compose.test.yml) handles this automatically
via the test-entrypoint.sh script.
Without it, the app tries to start Bandit on port 4000, which conflicts with the running daemon.
The application.ex skips the HTTP endpoint when MIX_ENV=test:
children =
if Mix.env() == :test do
base_children # no Bandit
else
base_children ++ [{Bandit, plug: Giulia.Daemon.Endpoint, port: port}]
endThe isolated test environment sets MIX_ENV=test automatically.
- Test files live under
test/giulia/mirroring thelib/giulia/structure - File naming:
lib/giulia/foo/bar.ex→test/giulia/foo/bar_test.exs - Adversarial tests:
test/giulia/foo/bar_adversarial_test.exs - Integration tests:
test/integration/(full HTTP stack via Plug.Test) - All tests use
use ExUnit.Case, async: trueunless they need shared state (GenServers, ETS) - Use
Code.ensure_loaded!(Module)beforefunction_exported?/3checks — aliases don't trigger module loading
84+ test files, 1707 tests
Build 138: 1,707 tests, 0 failures (isolated test environment).
All previously known failures (ArcadeDB setup, CubDB corruption, ETS state pollution, private function tests) were fixed in Build 138. If you see failures, they are real regressions — investigate before merging.
| Category | Files | Tests | Description |
|---|---|---|---|
| Unit tests | 68 | ~1111 | Direct module testing |
| Adversarial tests | 15 | ~371 | Malformed input, edge cases, security |
| Integration tests | 1 | 52 | Full HTTP stack via Plug.Test |
core/context_manager.ex—context_manager_test.exscore/path_mapper.ex—path_mapper_test.exscore/path_sandbox.ex—path_sandbox_test.exs+path_sandbox_adversarial_test.exs
ast/analysis.ex—analysis_test.exsast/extraction.ex—extraction_test.exsast/patcher.ex—patcher_test.exsast/processor.ex—processor_test.exs+pathological_test.exsast/slicer.ex—slicer_test.exs
context/builder.ex—builder_test.exscontext/indexer.ex—indexer_test.exs+indexer_race_test.exscontext/store.ex—store_test.exs+store_concurrency_test.exscontext/store/formatter.ex—store/formatter_test.exscontext/store/query.ex—store/query_test.exs
inference/approval.ex—approval_test.exsinference/bulk_replace.ex—bulk_replace_test.exsinference/context_builder.ex—context_builder_test.exsinference/context_builder/helpers.ex—context_builder/helpers_test.exsinference/context_builder/intervention.ex—context_builder/intervention_test.exsinference/context_builder/messages.ex—context_builder/messages_test.exsinference/context_builder/preview.ex—context_builder/preview_test.exsinference/engine.ex—engine_test.exsinference/engine/helpers.ex—engine/helpers_test.exsinference/engine/step.ex—engine/step_test.exsinference/escalation.ex—escalation_test.exsinference/events.ex—events_test.exsinference/orchestrator.ex—orchestrator_test.exsinference/pool.ex—pool_test.exsinference/rename_mfa.ex—rename_mfa_test.exsinference/response_parser.ex—response_parser_test.exsinference/state.ex—state_test.exsinference/state/counters.ex—state/counters_test.exsinference/state/tracking.ex—state/tracking_test.exsinference/tool_dispatch.ex—tool_dispatch_test.exsinference/tool_dispatch/guards.ex—tool_dispatch/guards_test.exsinference/trace.ex—trace_test.exsinference/transaction.ex—transaction_test.exsinference/verification.ex—verification_test.exs
intelligence/architect_brief.ex—architect_brief_test.exsintelligence/plan_validator.ex—plan_validator_test.exsintelligence/preflight.ex—preflight_test.exs
knowledge/analyzer.ex—analyzer_test.exsknowledge/builder.ex—builder_test.exs+builder_adversarial_test.exsknowledge/insights.ex—insights_test.exsknowledge/macro_map.ex—macro_map_test.exsknowledge/metrics.ex—metrics_test.exsknowledge/store.ex—store_test.exs+partial_graph_test.exsknowledge/topology.ex—topology_test.exs
monitor/store.ex—store_test.exs+store_adversarial_test.exsmonitor/telemetry.ex—telemetry_adversarial_test.exs
persistence/loader.ex—loader_test.exs+loader_adversarial_test.exspersistence/merkle.ex—merkle_test.exs+merkle_adversarial_test.exspersistence/store.ex—store_test.exs+store_adversarial_test.exspersistence/writer.ex—writer_test.exs+writer_adversarial_test.exs
provider/anthropic.ex—response_adversarial_test.exsprovider/gemini.ex—response_adversarial_test.exsprovider/groq.ex—response_adversarial_test.exsprovider/lm_studio.ex—response_adversarial_test.exsprovider/router.ex—router_test.exsprovider.ex—provider_test.exs
tools/commit_changes.ex—commit_changes_test.exstools/get_staged_files.ex—get_staged_files_test.exstools/read_file.ex—read_file_test.exstools/registry.ex—registry_test.exstools/respond.ex—respond_test.exstools/think.ex—think_test.exs- All 22 tool modules —
tools_contract_test.exs(name, description, parameters)
structured_output.ex—structured_output_test.exs+structured_output_adversarial_test.exsstructured_output/parser.ex—parser_test.exs+parser_adversarial_test.exsprompt/builder.ex—builder_test.exsutils/diff.ex—diff_test.exsversion.ex—version_test.exsdaemon/helpers.ex—helpers_test.exs
These modules have no dedicated unit test file but are exercised end-to-end
by test/integration/api_adversarial_test.exs (52 tests via Plug.Test):
daemon/endpoint.ex— all core routes (health, command, ping, status, projects)daemon/routers/index.ex— 6 routes testeddaemon/routers/knowledge.ex— 11 routes testeddaemon/routers/monitor.ex— 3 routes testeddaemon/routers/discovery.ex— 3 routes testeddaemon/skill_router.ex— exercised via all sub-routersinference/engine/response.ex— via POST /api/commandinference/tool_dispatch/executor.ex— via tool execution pipelineinference/tool_dispatch/staging.ex— via /api/transactioninference/tool_dispatch/approval.ex— via /api/approval
Grouped by reason:
These require live LLM connections, external services, or multi-node setups:
client.ex— HTTP client entry point (needs running daemon)client/approval.ex— approval UI (needs daemon + LLM)client/commands.ex— CLI command dispatch (needs daemon)client/daemon.ex— daemon lifecycle managementclient/http.ex— HTTP request helpers (needs daemon)client/output.ex— terminal output formattingclient/renderer.ex— Owl TUI rendering (visual)client/repl.ex— interactive REPL loopprovider/ollama.ex— Ollama API (needs running Ollama)intelligence/embedding_serving.ex— EXLA model loading (tested indirectly)intelligence/semantic_index.ex— embedding search (needs EmbeddingServing)runtime/collector.ex— BEAM runtime data (needs live processes)runtime/inspector.ex— BEAM introspection (needs live processes)
These are mostly glue code — the modules they call ARE tested:
application.ex— OTP supervision tree (startup only)inference/engine/commit.ex— commit dispatch (calls tested modules)inference/engine/startup.ex— engine init (calls tested modules)inference/supervisor.ex— supervisor spec (OTP boilerplate)inference/tool_dispatch/special.ex— bulk ops dispatchintelligence/surgical_briefing.ex— briefing compositionknowledge/behaviours.ex— behaviour analysisknowledge/store/reader.ex— store read delegation
Partially covered by integration tests, remaining routes untested:
daemon/routers/approval.ex— 2 routesdaemon/routers/intelligence.ex— 4 routesdaemon/routers/runtime.ex— 8 routesdaemon/routers/search.ex— 3 routesdaemon/routers/transaction.ex— 3 routes
All have contract tests (name/description/parameters) via tools_contract_test.exs.
Missing execution tests for:
tools/bulk_replace.extools/cycle_check.extools/edit_file.extools/get_context.extools/get_function.extools/get_impact_map.extools/get_module_info.extools/list_files.extools/lookup_function.extools/patch_function.extools/rename_mfa.extools/run_mix.extools/run_tests.extools/search_code.extools/search_meaning.extools/trace_path.extools/write_file.extools/write_function.ex
Integration tests at test/integration/api_adversarial_test.exs use Plug.Test
to simulate HTTP requests without a TCP connection. They exercise the full stack:
HTTP Request → Plug.Router → GenServer → ETS → Business Logic → JSON Response
use Plug.Test
alias Giulia.Daemon.Endpoint
@opts Endpoint.init([])
@project_path "/projects/Giulia" # matches the volume mount in docker-compose.yml
# Simulate GET with query params
defp get(path, query_params \\ %{}) do
query = URI.encode_query(query_params)
full_path = if query == "", do: path, else: "#{path}?#{query}"
:get |> conn(full_path) |> Endpoint.call(@opts)
end
# Simulate POST with JSON body
defp post(path, body) do
:post
|> conn(path, Jason.encode!(body))
|> put_req_header("content-type", "application/json")
|> Endpoint.call(@opts)
endThe 6 dispatcher modules that can't be unit-tested without mocking their downstream dependencies. Instead, we test the real pipeline end-to-end:
| Module | Tested Via | What's Validated |
|---|---|---|
| Engine.Response | POST /api/command | Unknown command → error JSON |
| ToolDispatch.Executor | POST /api/command | Tool pipeline doesn't crash |
| ToolDispatch.Staging | POST /api/transaction | Transaction lifecycle |
| ToolDispatch.Approval | POST /api/approval | Approval flow |
| Engine.Commit | POST /api/command | Staged file commit |
| ToolDispatch.Special | POST /api/command | Bulk operation dispatch |
In a test environment, the property graph may be empty (no project scanned). Tests for module-specific queries (centrality, dependents, etc.) accept both 200 (graph populated) and 404 (module not found) — they validate the HTTP layer, not the business logic.
defmodule Giulia.Foo.BarTest do
use ExUnit.Case, async: true
alias Giulia.Foo.Bar
describe "function_name/arity" do
test "handles normal input" do
assert Bar.function_name(input) == expected
end
test "handles edge case" do
assert Bar.function_name(nil) == {:error, :invalid}
end
end
enddefmodule Giulia.Foo.ServerTest do
use ExUnit.Case, async: false # shared state
setup do
# Start isolated instance if possible
{:ok, pid} = Giulia.Foo.Server.start_link(name: nil)
%{pid: pid}
end
test "does something", %{pid: pid} do
assert GenServer.call(pid, :something) == :ok
end
enddefmodule Giulia.Tools.MyToolTest do
use ExUnit.Case, async: true
alias Giulia.Tools.MyTool
test "name/0 returns tool name" do
assert MyTool.name() == "my_tool"
end
test "description/0 is non-empty" do
assert is_binary(MyTool.description())
assert String.length(MyTool.description()) > 0
end
test "parameters/0 returns valid schema" do
params = MyTool.parameters()
assert is_map(params)
assert Map.has_key?(params, "type")
end
enddefmodule Giulia.Foo.BarAdversarialTest do
@moduledoc """
Adversarial tests for Foo.Bar.
Targets: nil inputs, huge inputs, malformed data, type confusion,
concurrent access, boundary values.
"""
use ExUnit.Case, async: true # or false if shared state
alias Giulia.Foo.Bar
describe "function_name with malformed input" do
test "nil input" do
# Should not crash
result = Bar.function_name(nil)
assert result == {:error, _} or result == :default
end
test "extremely large input" do
big = String.duplicate("x", 100_000)
result = Bar.function_name(big)
assert is_binary(result) or match?({:error, _}, result)
end
end
end