Skip to content

feat(evals): add Scala retrieval eval and fix dropped infix/field-expression Scala calls#567

Merged
vitali87 merged 4 commits into
mainfrom
eval/scala-retrieval
Jul 1, 2026
Merged

feat(evals): add Scala retrieval eval and fix dropped infix/field-expression Scala calls#567
vitali87 merged 4 commits into
mainfrom
eval/scala-retrieval

Conversation

@vitali87

@vitali87 vitali87 commented Jul 1, 2026

Copy link
Copy Markdown
Owner

What

Adds a Scala CALLS-vs-scalameta retrieval eval (the last cgr-supported language without one) and fixes a real cgr bug it surfaced.

The eval

evals/scala_retrieval.py grades cgr's Scala CALLS edges, reduced to (caller_file, callee_simple_name), against call sites extracted by scalameta over the same first-party name universe (every declared def). The oracle (evals/oracles/scala_oracle/Oracle.scala, run via scala-cli) is independent of cgr's tree-sitter frontend. It grades only files it parses cleanly (covered set; the cgr side is held to the same files), so a Scala 3 file the 2.13 parser rejects penalizes neither side. Requires scala-cli on PATH; the eval exits cleanly if absent. Pinned by codebase_rag/tests/test_scala_retrieval_eval.py.

The bug

cgr's Scala spec lists infix_expression and field_expression as call node types, so both are collected, but call_processor._get_call_target_name had no case for either:

  • an infix_expression (a ~> b, xs map f) exposes its callee through an operator field;
  • a paren-less nullary call (obj.done) is a bare field_expression whose callee is its field.

Neither has the function field the extractor keyed on, so every such call returned no name and never attached. The fix adds two Scala-gated cases (the field_expression node type collides with C++, so the guard keeps other languages inert).

Result

On scala-parser-combinators (a combinator DSL, so infix-operator heavy): recall 0.40 → 0.73, F1 0.57 → 0.84, precision held at 1.0 (zero false positives). scopt shows the same shape (precision 1.0, recall 0.71). The residual is the diffuse receiver-type-inference tail every language eval carries (implicit conversions such as "" ~>, deeply generic receivers), not a systematic gap.

Testing

RED→GREEN in history: the eval + fixture test commit is red for the infix/field-expression assertions; the fix commit turns it green. Full non-integration suite (4121 passed) and integration suite (142 passed, serial) green; call/language-focused subset (1541 passed) confirms no cross-language regression.

@vitali87

vitali87 commented Jul 1, 2026

Copy link
Copy Markdown
Owner Author

@greptile review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces multi-language retrieval evaluation for Scala, comparing cgr's Scala call graph against an independent oracle built using scalameta (run via scala-cli). It also fixes a bug in call_processor.py where Scala infix operator calls and nullary method calls were dropped, and adds corresponding tests, constants, logging, and documentation. The review feedback suggests improving error handling in evals/oracles/scala_oracle.py when parsing the JSON output of the subprocess, specifically by catching JSONDecodeError and including stderr and stdout in the error message to facilitate debugging.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread evals/oracles/scala_oracle.py Outdated
Comment on lines +31 to +38
proc = subprocess.run(
[scala_cli, ec.SCALA_CLI_RUN, str(_SOURCE), ec.SCALA_CLI_ARG_SEP, str(target)],
cwd=str(_ORACLE_DIR),
capture_output=True,
text=True,
check=True,
)
payload: OraclePayload = json.loads(proc.stdout or "{}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since capture_output=True is used, any warnings or errors printed to stderr (or unexpected output on stdout) will be captured and not visible to the user if json.loads fails with a JSONDecodeError. To make debugging easier, consider catching json.JSONDecodeError and logging or raising an exception that includes proc.stderr and proc.stdout.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cd32275: _run_scala_oracle_payload now catches json.JSONDecodeError and raises a RuntimeError that includes proc.stderr and proc.stdout (via SCALA_ORACLE_DECODE_FAILED), so a compile error or changed launcher banner is surfaced instead of a context-free decode failure.

@greptile-apps

greptile-apps Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a Scala retrieval eval and updates Scala call extraction. The main changes are:

  • New scalameta-based Scala oracle run through scala-cli.
  • New evals.scala_retrieval CLI for scoring cgr Scala CALLS edges.
  • Scala infix operator call names extracted from the tree-sitter operator field.
  • Bare paren-less selects excluded from both cgr and oracle grading.
  • Fixture tests and docs for the Scala retrieval workflow.

Confidence Score: 4/5

The Scala retrieval evaluation and call extraction changes are narrowly scoped, with attention needed only for comment-policy compliance in the new oracle file.

The functional changes are covered by the added Scala retrieval fixture tests and described broader test runs; the remaining concern is a project-style hook violation in explanatory inline comments.

evals/oracles/scala_oracle/Oracle.scala

T-Rex T-Rex Logs

What T-Rex did

  • Compared the scala-eval before and after logs to verify changes, noting that the after state includes a head commit, absence of scala-cli, and a stubbed execution proof for score and diff outputs.
  • Reviewed the scala-calls before and after logs, observing that the infix_expression 't > T.make()' now yields '>' while the field_expression 'u.done' remains None.

View all artifacts

T-Rex Ran code and verified through T-Rex

Comments Outside Diff (1)

  1. General comment

    P1 Scala bare nullary field_expression calls are still dropped

    • Bug
      • The PR contract says Scala-gated handling should make paren-less nullary field_expression calls expose the field, e.g. u.done should produce a done target and therefore a user-visible CALLS edge when done is a declared first-party method. Executed head behavior still returns None for the parsed field_expression 'u.done', so the call target path continues to drop that edge. Infix operator handling is fixed, but the field-expression half of the described fix is not delivered.
    • Cause
      • codebase_rag/parsers/call_processor.py around the Scala-gated infix case (lines 847-859 on head) explicitly handles only TS_SCALA_INFIX_EXPRESSION and comments that bare field_expression is deliberately not named, leaving Scala nullary selects to fall through to None.
    • Fix
      • Add the promised Scala-gated field_expression target extraction for the appropriate nullary-call scenario, returning the field child text (e.g. done) without affecting other languages; then add/adjust a regression test that exercises u.done and verifies the corresponding CALLS edge or target is present.

    T-Rex Ran code and verified through T-Rex

Reviews (3): Last reviewed commit: "refactor(scala): scope call retrieval to..." | Re-trigger Greptile

Comment thread evals/oracles/scala_oracle/Oracle.scala Outdated
@greptile-apps

greptile-apps Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds Scala retrieval evaluation and updates Scala call extraction. The main changes are:

  • New scalameta oracle and scala-cli wrapper for Scala call-edge evaluation.
  • New evals.scala_retrieval driver for scoring cgr Scala CALLS edges against oracle calls.
  • Scala call target extraction for infix expressions and paren-less field expressions.
  • Fixture tests and README coverage for the new Scala retrieval eval.

Confidence Score: 4/5

The Scala eval additions are mostly contained, but the call extraction change can create incorrect graph edges for ordinary field reads.

The review is focused on the changed parser and eval paths, and the reported behavior was confirmed with a targeted Scala indexing case.

codebase_rag/parsers/call_processor.py

T-Rex T-Rex Logs

What T-Rex did

  • Reproduced the focused Scala field-read over-resolution in the repro harness using Holder.done as a val, Other.done as a first-party def, and Use.readField reading only obj.done; the repro shows an assertion failure where a CALLS edge was emitted for the non-method field read, with logs indicating a trie fallback to scala_field_over_resolve.Other.Other.done.
  • Blocked environment/setup for Scala eval path: base availability failed due to an environment/build dependency issue before any Scala eval could exist, and head availability reported scala-cli None with the import stack failing before oracle/scoring could run; a base missing-tool command also failed with a ModuleNotFoundError for defusedxml before Scala paths could be exercised.
  • Compared Scala infix-call behavior across runs: the base run had no CGR_CALL_EDGES and HAS_INFIX_EDGE was False, while the head run showed a CGR_CALL_EDGES entry for ('T.scala', '>') and HAS_INFIX_EDGE True, with the resolver logging '> -> scala_infix_probe.T.T.~>'; the Python sanity run also reported Python_CALLS and HAS_HELPER_NAME True.

View all artifacts

T-Rex Ran code and verified through T-Rex

Comments Outside Diff (1)

  1. General comment

    P1 Scala retrieval CLI can traceback before the clean missing-tool exit

    • Bug
      • Running python -m evals.scala_retrieval on head with PATH set so scala-cli cannot be found does not reach the advertised clean Typer exit. The module import first executes evals/__init__.py, which imports the broader CGR graph stack and fails on missing transitive dependencies (defusedxml in the captured run), producing an unhandled traceback before main() checks scala_available(). This violates the claimed CLI/toolchain behavior for absent scala-cli.
    • Cause
      • The CLI module is executed as part of the evals package, whose package initialization eagerly imports cgr_graph/codebase_rag dependencies before evals.scala_retrieval.main() can perform the explicit scala_available() check and raise typer.Exit.
    • Fix
      • Avoid eager heavyweight imports on the python -m evals.scala_retrieval path before the missing-scala-cli guard. For example, make evals/__init__.py lazy/minimal, or move CGR-dependent imports inside functions after the scala_available() check so the CLI can report the missing Scala toolchain cleanly.

    T-Rex Ran code and verified through T-Rex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
codebase_rag/parsers/call_processor.py:859-862
**Field accesses over-resolve**
Returning the bare `field` name for every Scala `field_expression` makes non-call field reads resolve through `resolve_function_call`'s simple-name fallback. When `obj.done` is a `val` or field and any first-party `def done` exists, this emits a false `CALLS` edge even though no method was invoked.

Reviews (2): Last reviewed commit: "fix(scala): name infix-operator and pare..." | Re-trigger Greptile

Comment thread codebase_rag/parsers/call_processor.py Outdated
@vitali87

vitali87 commented Jul 1, 2026

Copy link
Copy Markdown
Owner Author

@greptile review

@vitali87

vitali87 commented Jul 1, 2026

Copy link
Copy Markdown
Owner Author

On the remaining 'bare field_expression calls still dropped' point: this is intentional, and it directly implements the earlier P1 review feedback. Scala's uniform access makes a nullary method call (obj.done) and a plain field read (obj.done where done is a val) syntactically identical, so naming a bare field_expression/Term.Select as a call would turn a same-named field read into a spurious CALLS edge (the exact over-resolution the first review flagged P1 on both the oracle and cgr sides). The two are irreducible without receiver-type inference, so both cgr's extractor and the oracle deliberately grade application/infix sites only. Empirically this costs no recall (bare-select resolution added zero edges on scala-parser-combinators and scopt; recall 0.73->0.77, precision 1.0). test_scala_retrieval_eval.py pins the decision with assert ('Use.scala','done') not in edges. Delivering the field_expression edge and avoiding the field-read FP are mutually exclusive here; this PR chooses precision, consistent with the P1.

@vitali87 vitali87 merged commit 0b4fab5 into main Jul 1, 2026
2 of 15 checks passed
@vitali87 vitali87 deleted the eval/scala-retrieval branch July 1, 2026 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant