feat(parsers): extract access modifiers and decorators via highlights.scm by ChetanyaRathi · Pull Request #566 · vitali87/code-graph-rag

ChetanyaRathi · 2026-07-01T20:33:27Z

Closes #525. Advances #521.

Generalizes access-modifier and decorator extraction from Java-only to a single
shared, highlights.scm-driven path for all languages.

New shared extract_modifiers_and_decorators in parsers/utils.py, loaded via
parser_loader.py; populates modifiers: list[str] and decorators: list[str] on
Function/Method/Class nodes (empty list when absent).
Refactored the per-language handlers (java, js_ts, php, rust, python) onto the
shared path, removing the bespoke logic; kept the handler Protocol consistent.
Added per-language extraction tests.

Testing: full unit suite green on CI targets; ruff check + ruff format clean;
ty check codebase_rag (--exclude tests) clean. Local Windows shows only
OS-specific failures (path separators, symlink privileges, cp1252 encoding,
libclang) that pass on Linux CI.

…ghlights - Append custom fallback decorator highlights for languages missing them in upstream tree-sitter packages. - Expand modifier extraction to check wrapper nodes (like decorated_definition). - Remove obsolete decorator extraction tests.

….scm

…guage extraction tests

…lers and update ingest tests

gemini-code-assist

Code Review

This pull request refactors decorator and modifier extraction across multiple languages by replacing language-specific handler methods with a unified utility that leverages tree-sitter highlights queries. It also updates the schema to store modifiers for classes, functions, and methods. The review feedback highlights three key issues: a bug in query loading where a module import failure incorrectly skips fallback queries, noisy modifier extraction that captures definition keywords like def or class, and a parsing failure in _decorator_tail_names when handling decorators with arguments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-01T20:37:00Z

+    try:
+        module_name = f"{cs.TREE_SITTER_MODULE_PREFIX}{lang_name.replace('-', '_')}"
+        module = importlib.import_module(module_name)
+
+        query_str = ""
+        if hasattr(module, "HIGHLIGHTS_QUERY"):
+            query_str = module.HIGHLIGHTS_QUERY
+
+        fallback_path = (
+            Path(__file__).parent / "queries" / "highlights" / f"{lang_name}.scm"
+        )
+        if fallback_path.exists():
+            custom_queries = fallback_path.read_text(encoding="utf-8")
+            query_str = (
+                query_str + "\n" + custom_queries if query_str else custom_queries
+            )
+
+        if query_str:
+            return Query(language, query_str)
+    except Exception as e:
+        logger.debug(f"Failed to load highlights query for {lang_name}: {e}")
+    return None


If the tree-sitter module import fails (e.g., ModuleNotFoundError or ImportError), the entire try block is aborted, which completely skips loading the fallback highlights query from fallback_path. This is a major bug because fallback queries should still be loaded even if the module import fails (or if the module is loaded from a submodule where the bindings path is no longer in sys.path). Separating the module import/query extraction from the fallback path loading using separate try-except blocks is much more robust and correct.

query_str = "" try: module_name = f"{cs.TREE_SITTER_MODULE_PREFIX}{lang_name.replace('-', '_')}" module = importlib.import_module(module_name) if hasattr(module, "HIGHLIGHTS_QUERY"): query_str = module.HIGHLIGHTS_QUERY except Exception as e: logger.debug(f"Failed to import tree-sitter module for {lang_name}: {e}") try: fallback_path = ( Path(__file__).parent / "queries" / "highlights" / f"{lang_name}.scm" ) if fallback_path.exists(): custom_queries = fallback_path.read_text(encoding="utf-8") query_str = ( query_str + "\n" + custom_queries if query_str else custom_queries ) if query_str: return Query(language, query_str) except Exception as e: logger.debug(f"Failed to load highlights query for {lang_name}: {e}") return None

gemini-code-assist · 2026-07-01T20:37:00Z

+def extract_modifiers_and_decorators(
+    node: ASTNode, lang_queries: LanguageQueries
+) -> tuple[list[str], list[str]]:
+    query = lang_queries.get("highlights")
+    if not query:
+        return [], []
+
+    cursor = get_query_cursor(query)
+
+    body_node = node.child_by_field_name("body")
+    header_end_byte = body_node.start_byte if body_node else node.end_byte
+
+    target_node = node
+    if node.parent and node.parent.type in ("decorated_definition", "export_statement"):
+        target_node = node.parent
+
+    cursor.set_byte_range(target_node.start_byte, header_end_byte)
+
+    captures = sorted_captures(cursor, target_node)
+
+    modifiers: list[str] = []
+    decorators: list[str] = []
+
+    for name, nodes in captures.items():
+        if name.startswith("keyword.modifier") or name == "keyword":
+            for n in nodes:
+                text = safe_decode_text(n)
+                if text and text not in modifiers:
+                    modifiers.append(text)
+        elif name.startswith("attribute") or name.startswith("function.decorator"):
+            for n in nodes:
+                text = safe_decode_text(n)
+                if text and text not in decorators:
+                    decorators.append(text)
+
+    return modifiers, decorators


The code extracts any keyword captured as @keyword or @keyword.modifier as a modifier. However, standard highlights queries capture definition keywords (like def, class, fn, struct, impl, interface, enum, function, trait, type) and literal keywords (like None, True, False, null, true, false, void) as @keyword. This results in noisy and incorrect modifiers (e.g., modifiers=["def"] for Python functions, modifiers=["class"] for Python classes, modifiers=["fn"] for Rust functions, etc.). We should define a set of excluded keywords and filter them out when extracting modifiers.

_EXCLUDED_KEYWORDS = frozenset({ "def", "class", "fn", "struct", "impl", "interface", "enum", "function", "trait", "type", "void", "None", "True", "False", "null", "true", "false", "return", "import", "from", "as", "where" }) def extract_modifiers_and_decorators( node: ASTNode, lang_queries: LanguageQueries ) -> tuple[list[str], list[str]]: query = lang_queries.get("highlights") if not query: return [], [] cursor = get_query_cursor(query) body_node = node.child_by_field_name("body") header_end_byte = body_node.start_byte if body_node else node.end_byte target_node = node if node.parent and node.parent.type in ("decorated_definition", "export_statement"): target_node = node.parent cursor.set_byte_range(target_node.start_byte, header_end_byte) captures = sorted_captures(cursor, target_node) modifiers: list[str] = [] decorators: list[str] = [] for name, nodes in captures.items(): if name.startswith("keyword.modifier") or name == "keyword": for n in nodes: text = safe_decode_text(n) if text and text not in modifiers and text not in _EXCLUDED_KEYWORDS: modifiers.append(text) elif name.startswith("attribute") or name.startswith("function.decorator"): for n in nodes: text = safe_decode_text(n) if text and text not in decorators: decorators.append(text) return modifiers, decorators

Fixed in commit 8191a09

gemini-code-assist · 2026-07-01T20:37:00Z

 def _decorator_tail_names(decorators: list[str]) -> set[str]:
    return {
-        decorator.lstrip(cs.DECORATOR_AT).split(cs.SEPARATOR_DOT)[-1]
+        decorator.lstrip("@#[]() ").split(cs.SEPARATOR_DOT)[-1].rstrip(")]")
        for decorator in decorators
    }


The code decorator.lstrip("@#[]() ").split(cs.SEPARATOR_DOT)[-1].rstrip(")]") fails to correctly extract the bare name of decorators with arguments (e.g., @cached_property(ttl=3600) or @Component({selector: 'app-root'})). For @cached_property(ttl=3600), it returns cached_property(ttl=3600, which fails to match cached_property in cs.PROPERTY_DECORATORS. We should split on ( first to strip any arguments before extracting the tail name.

Suggested change

def _decorator_tail_names(decorators: list[str]) -> set[str]:

return {

decorator.lstrip(cs.DECORATOR_AT).split(cs.SEPARATOR_DOT)[-1]

decorator.lstrip("@#[]() ").split(cs.SEPARATOR_DOT)[-1].rstrip(")]")

for decorator in decorators

}

def _decorator_tail_names(decorators: list[str]) -> set[str]:

return {

decorator.lstrip("@#[]() ").split("(")[0].split(cs.SEPARATOR_DOT)[-1].rstrip(")] ")

for decorator in decorators

}

References

When parsing decorators, annotations, or attributes, extract the full text including arguments, not just the name. This preserves crucial semantic information (e.g., arguments in @RequestMapping(value="/api") or #[derive(Debug)]) for RAG queries and ensures consistency across all supported languages (Python, Java, Rust, TypeScript).

greptile-apps · 2026-07-01T20:38:34Z

Greptile Summary

This PR moves modifier and decorator extraction into a shared highlights-query path. The main changes are:

Adds highlights queries to parser loading and language query metadata.
Populates modifiers and decorators on function, method, and class nodes.
Removes per-language handler decorator extraction.
Adds highlight query files for Java, JavaScript, PHP, Python, Rust, and TypeScript.
Updates tests and semantic search metadata for the new fields.

Confidence Score: 3/5

The shared extraction refactor has targeted regressions that can silently drop decorator metadata for supported languages.

The implementation is broadly covered by language tests, but the highlighted code paths leave Rust sibling attributes and vendored-grammar fallback highlights unhandled.

codebase_rag/parsers/utils.py and codebase_rag/parser_loader.py

T-Rex Logs

What T-Rex did

Reproduced the Rust attribute ingestion issue by running a focused ingestion script against a generated Rust crate; the run loaded the real Rust tree-sitter parser and ingested lib.rs, showing that Foo and function_attr were preserved while method_attr had an empty decorators array, and the repro assertion failed.
Reproduced that the fallback highlights query is unreachable when tree_sitter_python import fails, even though the local fallback exists; a focused Python repro attempted to monkeypatch the parser loading and the process exited with code 1 after demonstrating the failure.
Compared decorator extraction across pre- and post-runs, finding that tsFunc and tsMethod have empty decorators while Java/PHP nodes contain nested duplicate decorator entries in the results.
Compared modifier extraction results for Java/PHP and PHP, observing after-run modifiers_present=true for relevant nodes, and noting explicit modifiers for plain classes and PHP noModFunc, which at times appear in ways that differ from the empty-list expectation.

_{Ran code and verified through T-Rex}

Comments Outside Diff (3)

General comment

TypeScript function and method decorators are not extracted
- Bug
  - The head run creates TypeScript Function node tsFunc and Method node tsMethod, but both have decorators: [] despite source decorators @dec and @dec2(...). The decorated TypeScript class is populated, so this is a partial node-kind contract failure for TypeScript Function/Method nodes.
- Cause
  - The shared extract_modifiers_and_decorators path only widens the extraction target for parents of type decorated_definition or export_statement. In the TypeScript grammar, decorators for methods/functions are not captured within the byte range/node shape used for those node kinds, so codebase_rag/queries/highlights/typescript.scm capture (decorator) @function.decorator is not sufficient for these Function/Method nodes.
- Fix
  - Adjust the TypeScript extraction path to include the actual decorated parent/wrapper for method/function declarations, or otherwise extend the node/range selection in codebase_rag/parsers/utils.py and/or TypeScript handler logic so decorators preceding TypeScript functions and methods are included in the highlights query range. Add ingestion tests that assert TypeScript class, method, function, and undecorated nodes' exact decorators values.
_{Ran code and verified through T-Rex}
General comment

Java and PHP decorator lists include duplicate nested captures
- Bug
  - Head populates Java/PHP decorator fields, but not accurately as list[str]: Java class/method nodes include both full annotations and inner identifiers, e.g. ['@Ann', '@Ann2("cls")', 'Ann', 'Ann2']; PHP class/function/method nodes include both full attribute groups and inner attributes, e.g. ['#[Attr]', 'Attr', "#[Attr('func')]", "Attr('func')"]. This violates the user-visible contract for accurate decorator lists.
- Cause
  - The highlights queries capture both outer and nested decorator/attribute nodes (java.scm captures both marker_annotation and annotation; php.scm captures both attribute_group and attribute), and extract_modifiers_and_decorators appends every unique captured text without filtering contained captures or choosing a single canonical representation.
- Fix
  - Make decorator extraction de-duplicate nested captures by span containment or query only canonical outer decorator nodes. For Java, keep full annotation text only; for PHP, choose a stable representation (prefer full #[...] attribute group text if that is the public contract) and suppress contained attribute captures. Add exact-value ingestion tests for Java/PHP Class, Function where applicable, and Method nodes.
_{Ran code and verified through T-Rex}
General comment

Head records class/function syntax keywords as modifiers instead of empty lists when no modifiers are present
- Bug
  - In the head ingestion output, nodes with no actual access/language modifiers still receive non-empty modifiers lists containing declaration keywords. The Java plain class has modifiers ["class"], the PHP plain class has modifiers ["class"], and the PHP method declared as function noModFunc() has modifiers ["function"]. The contract under validation says Function, Method, and Class nodes should have modifiers: list[str] with empty lists when absent, generalizing access-modifier extraction. These values are not modifiers and prevent consumers from distinguishing an unmodified declaration from one with real modifiers.
- Cause
  - The shared highlights.scm-driven extraction path appears to accept broad highlight captures for declaration keywords, not just actual modifier/access-modifier captures, so structural syntax tokens captured from highlights are stored in the node modifiers property.
- Fix
  - Filter extracted modifier captures to real modifier tokens for each language/node kind, or refine the highlights/local query captures so class and function keywords are not returned as modifiers. Add regression coverage asserting that plain Java/PHP classes and PHP methods without access/static/final modifiers produce modifiers: [].
_{Ran code and verified through T-Rex}

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
codebase_rag/parsers/utils.py:102-106
**Rust attributes are skipped**
For Rust, outer attributes like `#[test]` and `#[derive(Debug)]` are sibling `attribute_item` nodes before the function or class node, not children of it. This range starts at `node.start_byte` and queries only `target_node`, so the shared extractor never sees those siblings. Rust methods and classes now get empty `decorators` where the removed handler walked `prev_named_sibling`.

### Issue 2 of 2
codebase_rag/parser_loader.py:237-255
**Fallback query is unreachable**
The local `queries/highlights/*.scm` fallback is inside the same `try` block after `importlib.import_module(module_name)`. When a grammar is loaded from the vendored submodule path instead of an installed `tree_sitter_*` package, that import fails after the path is removed, so the checked-in highlights query is never read and `highlights` becomes `None`. Modifier and decorator extraction silently disables for that language.

_{Reviews (1): Last reviewed commit: "fix(parsers): make extract_decorators pr..." | Re-trigger Greptile}

greptile-apps · 2026-07-01T20:38:39Z

+    target_node = node
+    if node.parent and node.parent.type in ("decorated_definition", "export_statement"):
+        target_node = node.parent
+
+    cursor.set_byte_range(target_node.start_byte, header_end_byte)


Rust attributes are skipped
For Rust, outer attributes like #[test] and #[derive(Debug)] are sibling attribute_item nodes before the function or class node, not children of it. This range starts at node.start_byte and queries only target_node, so the shared extractor never sees those siblings. Rust methods and classes now get empty decorators where the removed handler walked prev_named_sibling.

Artifacts

Repro: focused Rust attribute ingestion script

Contains supporting evidence from the run (text/x-python; charset=utf-8).

Repro: failing parser ingestion output showing empty method decorators

Keeps the command output available without making the summary code-heavy.

_{Ran code and verified through T-Rex}

Prompt To Fix With AI

This is a comment left during a code review. Path: codebase_rag/parsers/utils.py Line: 102-106 Comment: **Rust attributes are skipped** For Rust, outer attributes like `#[test]` and `#[derive(Debug)]` are sibling `attribute_item` nodes before the function or class node, not children of it. This range starts at `node.start_byte` and queries only `target_node`, so the shared extractor never sees those siblings. Rust methods and classes now get empty `decorators` where the removed handler walked `prev_named_sibling`. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-07-01T20:38:40Z

+        module_name = f"{cs.TREE_SITTER_MODULE_PREFIX}{lang_name.replace('-', '_')}"
+        module = importlib.import_module(module_name)
+
+        query_str = ""
+        if hasattr(module, "HIGHLIGHTS_QUERY"):
+            query_str = module.HIGHLIGHTS_QUERY
+
+        fallback_path = (
+            Path(__file__).parent / "queries" / "highlights" / f"{lang_name}.scm"
+        )
+        if fallback_path.exists():
+            custom_queries = fallback_path.read_text(encoding="utf-8")
+            query_str = (
+                query_str + "\n" + custom_queries if query_str else custom_queries
+            )
+
+        if query_str:
+            return Query(language, query_str)
+    except Exception as e:


Fallback query is unreachable
The local queries/highlights/*.scm fallback is inside the same try block after importlib.import_module(module_name). When a grammar is loaded from the vendored submodule path instead of an installed tree_sitter_* package, that import fails after the path is removed, so the checked-in highlights query is never read and highlights becomes None. Modifier and decorator extraction silently disables for that language.

Artifacts

Repro: focused script that simulates unavailable tree_sitter_python while local highlights fallback exists

Contains supporting evidence from the run (text/x-python; charset=utf-8).

Repro: runtime output showing import failure prevents fallback highlights query loading

Keeps the command output available without making the summary code-heavy.

_{Ran code and verified through T-Rex}

Prompt To Fix With AI

This is a comment left during a code review. Path: codebase_rag/parser_loader.py Line: 237-255 Comment: **Fallback query is unreachable** The local `queries/highlights/*.scm` fallback is inside the same `try` block after `importlib.import_module(module_name)`. When a grammar is loaded from the vendored submodule path instead of an installed `tree_sitter_*` package, that import fails after the path is removed, so the checked-in highlights query is never read and `highlights` becomes `None`. Modifier and decorator extraction silently disables for that language. How can I resolve this? If you propose a fix, please make it concise.

…nition keywords, strip decorator args, capture Rust sibling attributes

ChetanyaRathi · 2026-07-01T21:27:37Z

Heads up: CI didn't run here — all jobs show "The job was not started because your account is locked due to a billing issue," which looks like a repo-level Actions billing problem rather than anything in this PR. Locally the full unit suite passes (only Windows-specific path/symlink/encoding tests fail, which pass on Linux CI), ruff check + format are clean, and ty check codebase_rag is clean. Happy to re-run once Actions is available.

Bot added 4 commits July 1, 2026 15:39

feat(parsers): extract access modifiers and decorators via highlights…

0f23294

….scm

test(parsers): restore decorator protocol consistency and add per-lan…

abe1379

…guage extraction tests

fix(parsers): make extract_decorators protocol consistent across hand…

97f253e

…lers and update ingest tests

ChetanyaRathi requested a review from vitali87 as a code owner July 1, 2026 20:33

github-project-automation Bot added this to @vitali87's graph code Jul 1, 2026

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

greptile-apps Bot reviewed Jul 1, 2026

View reviewed changes

fix(parsers): address review - separate fallback loading, filter defi…

8191a09

…nition keywords, strip decorator args, capture Rust sibling attributes

ChetanyaRathi changed the title ~~Feat/525 highlights modifiers decorators~~ feat(parsers): extract access modifiers and decorators via highlights.scm Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parsers): extract access modifiers and decorators via highlights.scm#566

feat(parsers): extract access modifiers and decorators via highlights.scm#566
ChetanyaRathi wants to merge 5 commits into
vitali87:mainfrom
ChetanyaRathi:feat/525-highlights-modifiers-decorators

ChetanyaRathi commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

ChetanyaRathi Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

greptile-apps Bot commented Jul 1, 2026 •

edited

Loading

T-Rex Logs

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot Jul 1, 2026

Uh oh!

greptile-apps Bot Jul 1, 2026

Uh oh!

ChetanyaRathi commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ChetanyaRathi commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

ChetanyaRathi Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

T-Rex Logs

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

ChetanyaRathi commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jul 1, 2026 •

edited

Loading