Conversation
Demonstrates ArcadeDB's multi-model capabilities for IAM: - Graph traversal: permission resolution, shadow admin detection, SoD violations - Time-series: SOX compliance audit logs, dormant access detection - Vector similarity: behavioral anomaly detection via access pattern embeddings - PostgreSQL wire protocol: Python client using psycopg Includes 7 query patterns across 3 runners (curl, Java, Python). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the ArcadeDB examples by introducing a robust Identity & Access Management (IAM) use case. It serves as a practical demonstration of how ArcadeDB's multi-model features can be leveraged to build sophisticated security and compliance systems, integrating graph-based access control, time-series auditing, and vector-based anomaly detection within a single database. Furthermore, it expands the client ecosystem by providing a Python implementation, showcasing interoperability via the PostgreSQL wire protocol. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Test comment from Claude review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive new use case for Identity & Access Management (IAM), showcasing ArcadeDB's multi-model capabilities. The changes include new documentation, design plans, and a full implementation with runners for shell, Java, and Python. The overall structure is excellent and follows the established conventions of the repository. My review focuses on improving the robustness and security of the provided scripts and code. Specifically, I've identified opportunities to prevent potential SQL injection vulnerabilities in the Java and Python code by using parameterized queries. I've also suggested making the shell query script more dynamic to avoid hardcoded values, which will make the demonstration more robust and impressive. Additionally, there are some minor inconsistencies in the planning documents that should be corrected.
| String resourceList = soxResources.stream() | ||
| .map(n -> "'" + n.replace("'", "''") + "'") | ||
| .collect(java.util.stream.Collectors.joining(", ")); | ||
|
|
||
| String logSql = String.format( | ||
| """ | ||
| SELECT identityEmail, action, resourceName, recordedAt, source_ip | ||
| FROM AccessLog | ||
| WHERE resourceName IN [%s] | ||
| AND recordedAt > '2025-12-01 00:00:00' | ||
| ORDER BY recordedAt DESC""", resourceList); | ||
|
|
||
| try (ResultSet rs = db.query("sql", logSql)) { |
There was a problem hiding this comment.
The SQL query in runQuery3SoxComplianceAudit is constructed using String.format, which is vulnerable to SQL injection. It's a best practice to use parameterized queries, which are supported by the ArcadeDB Java driver. This improves security and code clarity.
String logSql =
"""
SELECT identityEmail, action, resourceName, recordedAt, source_ip
FROM AccessLog
WHERE resourceName IN :resources
AND recordedAt > '2025-12-01 00:00:00'
ORDER BY recordedAt DESC""";
try (ResultSet rs = db.query("sql", logSql, java.util.Map.of("resources", soxResources))) {There was a problem hiding this comment.
ArcadeDB's SQL dialect doesn't support IN :paramName with named parameter binding for lists. The RemoteDatabase.query() method accepts a Map<String, Object> but the SQL parser doesn't interpolate list parameters into IN clauses.
The values here come from our own prior query result (SOX-governed resource names), not from user input, so the injection surface is theoretical. Keeping the current String.format approach since it's the only way that works with ArcadeDB's SQL.
| resource_list = ', '.join(f"'{r}'" for r in sox_resources) | ||
| cur.execute(f""" | ||
| SELECT identityEmail, action, resourceName, recordedAt, source_ip | ||
| FROM AccessLog | ||
| WHERE resourceName IN [{resource_list}] | ||
| AND recordedAt > '2025-12-01 00:00:00' | ||
| ORDER BY recordedAt DESC | ||
| """) |
There was a problem hiding this comment.
The SQL query in run_query3 is constructed using an f-string, which makes it vulnerable to SQL injection. psycopg provides a safe way to pass parameters to queries, including lists for IN clauses. Using parameters is a critical security best practice.
cur.execute("""
SELECT identityEmail, action, resourceName, recordedAt, source_ip
FROM AccessLog
WHERE resourceName = ANY(%s)
AND recordedAt > '2025-12-01 00:00:00'
ORDER BY recordedAt DESC
""", (sox_resources,))There was a problem hiding this comment.
= ANY(%s) is PostgreSQL-specific syntax. ArcadeDB's PostgreSQL wire protocol emulates the PG protocol but uses its own SQL parser, which requires IN [...] (square brackets, not parentheses) and doesn't support ANY(). The suggested change would fail at query parse time.
The values come from our own prior query result, not user input.
| query "sql" " | ||
| SELECT identityEmail, action, resourceName, recordedAt, source_ip | ||
| FROM AccessLog | ||
| WHERE resourceName IN ['Production-DB', 'Payment-API', 'Audit-System'] | ||
| AND recordedAt > '2025-12-01 00:00:00' | ||
| ORDER BY recordedAt DESC | ||
| " |
There was a problem hiding this comment.
The second part of "Query 3: SOX Compliance Audit" uses a hardcoded list of resources: WHERE resourceName IN ['Production-DB', 'Payment-API', 'Audit-System']. This is not robust and will produce incorrect results if the underlying data changes. The script should dynamically use the output from the first part of the query to build the second one.
Here is a way to make it dynamic:
SOX_RESOURCES_JSON=$(query "sql" "
SELECT resource
FROM (
MATCH {type: Resource, as: res}
.out('GOVERNED_BY'){as: pol, where: (name = 'SOX-Compliance')}
RETURN res.name AS resource
)
")
RESOURCE_LIST_JSON=$(echo "$SOX_RESOURCES_JSON" | jq '[.[].resource]')
query "sql" "
SELECT identityEmail, action, resourceName, recordedAt, source_ip
FROM AccessLog
WHERE resourceName IN ${RESOURCE_LIST_JSON}
AND recordedAt > '2025-12-01 00:00:00'
ORDER BY recordedAt DESC
"There was a problem hiding this comment.
Fixed. Query 3 now dynamically builds the resource list from the SOX-governed resources query result using jq.
iam/queries/queries.sh
Outdated
| echo "--- Identities with approve permission ---" | ||
| query "sql" " | ||
| SELECT identity, resource, role | ||
| FROM ( | ||
| MATCH {type: Identity, as: u} | ||
| .out('MEMBER_OF'){while: (\$depth < 3)} | ||
| .out('HAS_ROLE'){as: r} | ||
| .out('GRANTS'){where: (action = 'approve')} | ||
| .out('APPLIES_TO'){as: res} | ||
| RETURN u.email AS identity, res.name AS resource, r.name AS role | ||
| ) | ||
| " | ||
|
|
||
| echo "" | ||
| echo "--- Identities with execute permission ---" | ||
| query "sql" " | ||
| SELECT identity, resource, role | ||
| FROM ( | ||
| MATCH {type: Identity, as: u} | ||
| .out('MEMBER_OF'){while: (\$depth < 3)} | ||
| .out('HAS_ROLE'){as: r} | ||
| .out('GRANTS'){where: (action = 'execute')} | ||
| .out('APPLIES_TO'){as: res} | ||
| RETURN u.email AS identity, res.name AS resource, r.name AS role | ||
| ) | ||
| " | ||
|
|
||
| echo "" | ||
| echo "SoD violation: identities appearing in BOTH lists for the same resource (carol@company.com on Payment-API)" |
There was a problem hiding this comment.
"Query 4: Separation of Duties Violation" is not dynamic. It runs two queries and then prints a hardcoded echo statement with the expected result. For a demonstration script, it's much more effective to compute the result dynamically. You can use jq to find the intersection of the results from the two queries.
Here is a suggested implementation:
APPROVERS=$(query "sql" "
SELECT identity, resource, role
FROM (
MATCH {type: Identity, as: u}
.out('MEMBER_OF'){while: (\$depth < 3)}
.out('HAS_ROLE'){as: r}
.out('GRANTS'){where: (action = 'approve')}
.out('APPLIES_TO'){as: res}
RETURN u.email AS identity, res.name AS resource, r.name AS role
)
")
EXECUTORS=$(query "sql" "
SELECT identity, resource, role
FROM (
MATCH {type: Identity, as: u}
.out('MEMBER_OF'){while: (\$depth < 3)}
.out('HAS_ROLE'){as: r}
.out('GRANTS'){where: (action = 'execute')}
.out('APPLIES_TO'){as: res}
RETURN u.email AS identity, res.name AS resource, r.name AS role
)
")
# Use jq to find violations
jq -n --argjson approvers "$APPROVERS" --argjson executors "$EXECUTORS" '
(\$approvers | map({key: (.identity + "|" + .resource), value: .}) | from_entries) as \$approver_map
| \$executors
| map(
select(\$approver_map[.identity + "|" + .resource])
| "VIOLATION: \(.identity) on \(.resource) | approve via: \(\$approver_map[.identity + "|" + .resource].role) | execute via: \(.role)"
)
| .[]
' | sed 's/"//g'There was a problem hiding this comment.
Fixed. Query 4 now captures both query results and computes the SoD violations dynamically using jq set intersection.
iam/queries/queries.sh
Outdated
| echo "--- Step 1: Identities with granted permissions ---" | ||
| query "sql" " | ||
| SELECT DISTINCT identity, resource | ||
| FROM ( | ||
| MATCH {type: Identity, as: u} | ||
| .out('MEMBER_OF'){while: (\$depth < 3)} | ||
| .out('HAS_ROLE'){} | ||
| .out('GRANTS'){} | ||
| .out('APPLIES_TO'){as: res} | ||
| RETURN u.email AS identity, res.name AS resource | ||
| ) | ||
| ORDER BY identity | ||
| " | ||
|
|
||
| echo "" | ||
| echo "--- Step 2: Identities with recent access (last 90 days) ---" | ||
| query "sql" " | ||
| SELECT DISTINCT identityEmail | ||
| FROM AccessLog | ||
| WHERE recordedAt > '2025-12-06 00:00:00' | ||
| ORDER BY identityEmail | ||
| " | ||
|
|
||
| echo "" | ||
| echo "Dormant identities = Step 1 minus Step 2 (bob@company.com, frank@company.com, svc-backup@company.com)" |
There was a problem hiding this comment.
"Query 5: Dormant Access Detection" is not dynamic. It runs two queries and then prints a hardcoded echo statement with the expected result. This should be computed dynamically by finding the set difference between users with permissions and users with recent activity.
Here is a way to compute this dynamically using jq:
GRANTED_USERS=$(query "sql" "
SELECT DISTINCT u.email AS identity
FROM (
MATCH {type: Identity, as: u}
.out('MEMBER_OF'){while: (\$depth < 3)}
.out('HAS_ROLE'){}
.out('GRANTS'){}
.out('APPLIES_TO'){as: res}
RETURN u.email
)
")
RECENT_USERS=$(query "sql" "
SELECT DISTINCT identityEmail
FROM AccessLog
WHERE recordedAt > '2025-12-06 00:00:00'
")
echo "Dormant identities (have permissions but no recent access):"
jq -n --argjson granted "$GRANTED_USERS" --argjson recent "$RECENT_USERS" '
(\$granted | map(.identity)) as \$g
| (\$recent | map(.identityEmail)) as \$r
| (\$g - \$r) | .[]
' | sed 's/"//g'There was a problem hiding this comment.
Fixed. Query 5 now captures both query results and computes dormant identities via jq set difference ($granted - $recent).
| query "cypher" " | ||
| MATCH (u:Identity {email: 'alice@company.com'}) | ||
| -[:MEMBER_OF*1..3]->(g:Group) | ||
| -[:HAS_ROLE]->(r:Role) | ||
| -[:GRANTS]->(p:Permission) | ||
| -[:APPLIES_TO]->(res:Resource) | ||
| RETURN res.name AS resource, p.action AS action, | ||
| r.name AS via_role, g.name AS via_group | ||
| ORDER BY resource, action | ||
| " |
There was a problem hiding this comment.
The implementation plan specifies using cypher for several queries in the queries.sh script (e.g., Query 1, 2, 4, 7). However, the actual implementation in iam/queries/queries.sh and the Java/Python runners uses sql (with SQL MATCH) for all graph queries. The plan should be updated to match the implementation for consistency.
There was a problem hiding this comment.
The plan doc is a historical artifact from before implementation. During implementation, we discovered that Cypher queries don't work over the PostgreSQL wire protocol, so all queries were changed to SQL MATCH for consistency across all three runners. Not updating the plan doc since it served its purpose.
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| with: | ||
| fetch-depth: 1 | ||
|
|
||
| - name: Set up Java | ||
| if: matrix.runner == 'java' | ||
| uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5.2.0 | ||
| with: | ||
| java-version: '21' | ||
| distribution: 'temurin' | ||
|
|
||
| - name: Cache Maven repository | ||
| if: matrix.runner == 'java' | ||
| uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3 | ||
| with: | ||
| path: ~/.m2 | ||
| key: ${{ runner.os }}-m2-iam-${{ hashFiles('iam/java/pom.xml') }} | ||
| restore-keys: ${{ runner.os }}-m2-iam- | ||
|
|
||
| - name: Set up Python | ||
| if: matrix.runner == 'python' | ||
| uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 | ||
| with: | ||
| python-version: '3.12' |
There was a problem hiding this comment.
The commented versions for the GitHub Actions in the CI workflow plan are incorrect and outdated. This can be misleading for future maintenance.
actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83ddisv4.1.7, notv6.0.2.actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654isv4.2.1, notv5.2.0.actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306isv4.0.2, notv5.0.3.actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065isv5.1.0, notv5.6.0.
Please update the comments to reflect the correct versions associated with the SHAs.
There was a problem hiding this comment.
The SHAs and version comments in the plan doc were copied from the existing CI workflows in this repo. The actual .github/workflows/iam.yml file uses the same pinned SHAs as the other use case workflows. These are historical plan notes, not maintained documentation.
|
Deleting test comment - actual review posted below |
|
PR Review - Part 1: Security and Correctness Issues SECURITY (Medium): SQL injection in iam.py Query 3 SECURITY (Low): Same in Java runQuery3SoxComplianceAudit CORRECTNESS: queries.sh Query 3 hardcodes SOX resource names CORRECTNESS: Query 7 MATCH filter may miss indirect members Generated with Claude Code https://claude.com/claude-code |
|
PR Review - Part 2: Runtime Risk, Minor Issues, and Positives POTENTIAL RUNTIME ISSUE: vectorNeighbors in ORDER BY (Query 6) MINOR: Permission type has no unique index on action MINOR: Hardcoded 90-day cutoff date needs comment What works well:
Generated with Claude Code https://claude.com/claude-code |
🧪 CI InsightsHere's what we observed from your CI run for d518155. 🟢 All jobs passed!But CI Insights is watching 👀 |
The `{while: ...}{as: target}` two-block MATCH pattern causes a parse
error in ArcadeDB 26.3.1. Refactored to two-step approach: start from
Platform-Admins group going outward for permissions, then inward for
member identities. Also removed DISTINCT from Java to avoid RemoteDatabase
serialization issue.
Verified all 7 queries pass on all 3 runners (curl, Java, Python).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Query 3: build SOX resource list from prior query via jq - Query 4: compute SoD violations via jq set intersection - Query 5: compute dormant identities via jq set difference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review Handling Summary
All 3 runners (curl, java, python) pass CI ✅ and were verified locally before push. |
|
test |
ArcadeDB Use Cases |
Code Review: feat(iam) - Identity and Access Management Use Case. See full review in follow-up comments. |
Bug 1: Dormant detection surfaces Bob too, not just Frank. The 90-day cutoff is 2025-12-06. Bob last accessed on 2025-09-15 (before the cutoff) and has permissions via Contractors -> Engineering -> Platform-Admins -> Admin -> admin. He will appear as dormant alongside Frank. The test plan should say both bob@company.com and frank@company.com are found, not just frank@company.com. |
|
Disregard the test/partial comments above - full review below. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
BUGS: (1) Dormant detection - the 90-day cutoff is 2025-12-06, Bob last accessed 2025-09-15 and has permissions via Contractors->Engineering->Platform-Admins->Admin, so Bob is ALSO dormant, not just Frank. Test plan needs updating. (2) INHERITS_FROM edge is defined in schema+data but zero queries traverse it - either use it or remove it. POTENTIAL ISSUES: (3) Query 3 dynamic SQL has inconsistent escaping: Java escapes single quotes correctly but Python and Shell do not. (4) vectorNeighbors() in ORDER BY is not documented as supported in 26.3.1 quirks - confirm via CI and add a note to CLAUDE.md. (5) iam.yml line 28 comments actions/checkout as v6.0.2 but v4.x is the latest - misleading annotation. MINOR: (6) No index on AccessLog.recordedAt despite Queries 3+5 filtering on it. (7) Permission.action has no UNIQUE index so re-running setup.sh could create duplicate Permission nodes. GOOD: Follows CLAUDE.md conventions throughout, clean Java 21 record usage, correct psycopg autocommit, fail-fast:false on matrix, well-engineered test scenarios (shadow admin, SoD). Reviewed by Claude Sonnet 4.6. |
…ssion.action - INHERITS_FROM was defined in schema/data but no query traversed it - Permission.action now has a UNIQUE index, consistent with other entity types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review Handling — Claude Sonnet 4.6 Review
All 3 runners verified locally after changes. |
|
test |
|
PR Review posted via Claude Code - see full review below (testing comment posting) |
|
PR Review Part 1 of 3 - HIGH PRIORITY ISSUES:
The Python version builds the SQL IN clause by directly f-string interpolating values fetched from the database with no escaping. A resource name containing a single quote would break the query. The Java counterpart does at least use .replace() to escape single quotes, but the Python version has no such guard. Fix: use psycopg placeholders or apply the same single-quote escaping the Java version uses. The same dynamic-SQL pattern in queries.sh Query 3 (RESOURCE_LIST shell variable) carries similar risk.
Query 6 uses vectorNeighbors in an ORDER BY clause. Per CLAUDE.md, vectorNeighbors returns a ranked neighbor set, not a scalar distance score. Using it in ORDER BY may silently return unordered results or error at runtime. It is typically used in the FROM position. Please verify this executes correctly end-to-end; if not, restructure as: SELECT email, department, identityType FROM vectorNeighbors('Identity[access_pattern_vec]', [...], 10) WHERE identityType = 'employee' |
|
PR Review Part 2 of 3 - MEDIUM PRIORITY + MINOR: MEDIUM PRIORITY:
MINOR:
|
|
PR Review Part 3 of 3 - WHAT IS WELL DONE:
(Review generated with Claude Code) |
|
NOTE: The first comment in this review thread (ID issuecomment-4014489144) was a test comment and can be ignored. The actual review is in the three comments that follow it. |
Summary
psycopgover PostgreSQL wire protocol) alongside the standard shell and Java runnersWhat's included
queries.sh), Java (IdentityAccessManagement.java), Python (iam.pyvia psycopg)[curl, java, python]Engineered scenarios
Reference
Based on arcadedb.com/iam.html
Test plan
curlrunner:docker compose up -d && ./setup.sh && ./queries/queries.sh— all 7 queries return non-empty resultsjavarunner:mvn package && java -jar target/iam.jar— all 7 queries print resultspythonrunner:pip install -r requirements.txt && python iam.py— connects via PG protocol, all 7 queries return resultsbob@company.comcarol@company.comfrank@company.com🤖 Generated with Claude Code