Skip to content

feat(iam): add Identity & Access Management use case#33

Merged
robfrank merged 5 commits intomainfrom
feat/iam
Mar 7, 2026
Merged

feat(iam): add Identity & Access Management use case#33
robfrank merged 5 commits intomainfrom
feat/iam

Conversation

@robfrank
Copy link
Contributor

@robfrank robfrank commented Mar 6, 2026

Summary

  • Adds a new Identity & Access Management (IAM) use case demonstrating ArcadeDB's multi-model capabilities for permission resolution, privilege escalation detection, compliance auditing, and behavioral anomaly detection
  • First use case with a Python client (psycopg over PostgreSQL wire protocol) alongside the standard shell and Java runners
  • 7 query patterns across 3 signal types: graph traversal, time-series audit logs, vector similarity

What's included

Component Details
Schema 7 vertex types (Identity, Group, Role, Permission, Resource, Policy), 6 edge types (MEMBER_OF, HAS_ROLE, GRANTS, APPLIES_TO, INHERITS_FROM, GOVERNED_BY), AccessLog document type, LSM_VECTOR index
Sample data 8 identities, 5 groups with nested memberships, 6 roles with inheritance, 6 resources, 3 compliance policies, 15 access log entries
Queries Permission Resolution, Shadow Admin Detection, SOX Compliance Audit, Separation of Duties, Dormant Access Detection, Behavioral Anomaly Detection, Impact Analysis
Runners curl (queries.sh), Java (IdentityAccessManagement.java), Python (iam.py via psycopg)
CI 3-runner matrix: [curl, java, python]

Engineered scenarios

  • Bob (contractor) has shadow admin access to critical resources via 3+ nested group memberships
  • Carol has a separation of duties violation (approve + execute on Payment-API)
  • Frank has dormant access (permissions granted but no recent usage)
  • Carol's access vector deviates from department baseline (anomaly detection)

Reference

Based on arcadedb.com/iam.html

Test plan

  • curl runner: docker compose up -d && ./setup.sh && ./queries/queries.sh — all 7 queries return non-empty results
  • java runner: mvn package && java -jar target/iam.jar — all 7 queries print results
  • python runner: pip install -r requirements.txt && python iam.py — connects via PG protocol, all 7 queries return results
  • Shadow admin detection finds bob@company.com
  • SoD violation detection finds carol@company.com
  • Dormant access detection finds frank@company.com

🤖 Generated with Claude Code

Demonstrates ArcadeDB's multi-model capabilities for IAM:
- Graph traversal: permission resolution, shadow admin detection, SoD violations
- Time-series: SOX compliance audit logs, dormant access detection
- Vector similarity: behavioral anomaly detection via access pattern embeddings
- PostgreSQL wire protocol: Python client using psycopg

Includes 7 query patterns across 3 runners (curl, Java, Python).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ArcadeDB examples by introducing a robust Identity & Access Management (IAM) use case. It serves as a practical demonstration of how ArcadeDB's multi-model features can be leveraged to build sophisticated security and compliance systems, integrating graph-based access control, time-series auditing, and vector-based anomaly detection within a single database. Furthermore, it expands the client ecosystem by providing a Python implementation, showcasing interoperability via the PostgreSQL wire protocol.

Highlights

  • New IAM Use Case: Introduced a comprehensive Identity & Access Management (IAM) use case demonstrating ArcadeDB's multi-model capabilities.
  • Multi-Model Demonstration: Showcased graph traversal for permission resolution, time-series for audit logs, and vector similarity for behavioral anomaly detection.
  • Python Client Integration: Added the first use case featuring a Python client (psycopg) interacting with ArcadeDB via the PostgreSQL wire protocol.
  • Extensive Query Patterns: Implemented 7 distinct query patterns covering various IAM scenarios like shadow admin detection, separation of duties, and impact analysis.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CLAUDE.md
    • Updated to list the new IAM use case.
  • README.md
    • Updated to include the IAM use case in the main table.
  • docs/plans/2026-03-06-iam-design.md
    • Added a detailed design document for the IAM use case.
  • docs/plans/2026-03-06-iam.md
    • Added a comprehensive implementation plan for the IAM use case.
  • iam/README.md
    • Added a dedicated README for the IAM use case.
  • iam/docker-compose.yml
    • Added Docker Compose configuration for the IAM use case.
  • iam/java/pom.xml
    • Added Maven pom.xml for the Java client.
  • iam/java/src/main/java/com/arcadedb/examples/IdentityAccessManagement.java
    • Added the Java application for IAM.
  • iam/python/iam.py
    • Added the Python application for IAM.
  • iam/python/requirements.txt
    • Added Python dependencies for psycopg.
  • iam/queries/queries.sh
    • Added shell script for curl-based queries.
  • iam/setup.sh
    • Added database setup script.
  • iam/sql/01-schema.sql
    • Added SQL schema definition.
  • iam/sql/02-data.sql
    • Added SQL sample data.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/iam.yml
Activity
  • The author created a detailed design document (docs/plans/2026-03-06-iam-design.md) and an implementation plan (docs/plans/2026-03-06-iam.md) for the IAM use case.
  • The author scaffolded the directory structure for the new iam/ project.
  • The author implemented the Docker Compose configuration, SQL schema, and sample data for the IAM database.
  • The author developed curl, Java, and Python clients to demonstrate the 7 IAM query patterns.
  • The author updated the root README.md and CLAUDE.md to reflect the new use case.
  • The author added a CI workflow to test the curl, Java, and Python runners for the IAM use case.
  • The author performed extensive testing as outlined in the PR description's "Test plan" and "Success Criteria" sections, verifying all 7 queries and specific engineered scenarios (shadow admin, SoD violation, dormant access, anomaly detection).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link

claude bot commented Mar 6, 2026

Test comment from Claude review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive new use case for Identity & Access Management (IAM), showcasing ArcadeDB's multi-model capabilities. The changes include new documentation, design plans, and a full implementation with runners for shell, Java, and Python. The overall structure is excellent and follows the established conventions of the repository. My review focuses on improving the robustness and security of the provided scripts and code. Specifically, I've identified opportunities to prevent potential SQL injection vulnerabilities in the Java and Python code by using parameterized queries. I've also suggested making the shell query script more dynamic to avoid hardcoded values, which will make the demonstration more robust and impressive. Additionally, there are some minor inconsistencies in the planning documents that should be corrected.

Comment on lines +132 to +144
String resourceList = soxResources.stream()
.map(n -> "'" + n.replace("'", "''") + "'")
.collect(java.util.stream.Collectors.joining(", "));

String logSql = String.format(
"""
SELECT identityEmail, action, resourceName, recordedAt, source_ip
FROM AccessLog
WHERE resourceName IN [%s]
AND recordedAt > '2025-12-01 00:00:00'
ORDER BY recordedAt DESC""", resourceList);

try (ResultSet rs = db.query("sql", logSql)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SQL query in runQuery3SoxComplianceAudit is constructed using String.format, which is vulnerable to SQL injection. It's a best practice to use parameterized queries, which are supported by the ArcadeDB Java driver. This improves security and code clarity.

    String logSql =
        """
            SELECT identityEmail, action, resourceName, recordedAt, source_ip
            FROM AccessLog
            WHERE resourceName IN :resources
              AND recordedAt > '2025-12-01 00:00:00'
            ORDER BY recordedAt DESC""";

    try (ResultSet rs = db.query("sql", logSql, java.util.Map.of("resources", soxResources))) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArcadeDB's SQL dialect doesn't support IN :paramName with named parameter binding for lists. The RemoteDatabase.query() method accepts a Map<String, Object> but the SQL parser doesn't interpolate list parameters into IN clauses.

The values here come from our own prior query result (SOX-governed resource names), not from user input, so the injection surface is theoretical. Keeping the current String.format approach since it's the only way that works with ArcadeDB's SQL.

Comment on lines +95 to +102
resource_list = ', '.join(f"'{r}'" for r in sox_resources)
cur.execute(f"""
SELECT identityEmail, action, resourceName, recordedAt, source_ip
FROM AccessLog
WHERE resourceName IN [{resource_list}]
AND recordedAt > '2025-12-01 00:00:00'
ORDER BY recordedAt DESC
""")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SQL query in run_query3 is constructed using an f-string, which makes it vulnerable to SQL injection. psycopg provides a safe way to pass parameters to queries, including lists for IN clauses. Using parameters is a critical security best practice.

    cur.execute("""
        SELECT identityEmail, action, resourceName, recordedAt, source_ip
        FROM AccessLog
        WHERE resourceName = ANY(%s)
          AND recordedAt > '2025-12-01 00:00:00'
        ORDER BY recordedAt DESC
    """, (sox_resources,))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= ANY(%s) is PostgreSQL-specific syntax. ArcadeDB's PostgreSQL wire protocol emulates the PG protocol but uses its own SQL parser, which requires IN [...] (square brackets, not parentheses) and doesn't support ANY(). The suggested change would fail at query parse time.

The values come from our own prior query result, not user input.

Comment on lines +78 to +84
query "sql" "
SELECT identityEmail, action, resourceName, recordedAt, source_ip
FROM AccessLog
WHERE resourceName IN ['Production-DB', 'Payment-API', 'Audit-System']
AND recordedAt > '2025-12-01 00:00:00'
ORDER BY recordedAt DESC
"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The second part of "Query 3: SOX Compliance Audit" uses a hardcoded list of resources: WHERE resourceName IN ['Production-DB', 'Payment-API', 'Audit-System']. This is not robust and will produce incorrect results if the underlying data changes. The script should dynamically use the output from the first part of the query to build the second one.

Here is a way to make it dynamic:

SOX_RESOURCES_JSON=$(query "sql" "
SELECT resource
FROM (
  MATCH {type: Resource, as: res}
        .out('GOVERNED_BY'){as: pol, where: (name = 'SOX-Compliance')}
  RETURN res.name AS resource
)
")
RESOURCE_LIST_JSON=$(echo "$SOX_RESOURCES_JSON" | jq '[.[].resource]')

query "sql" "
SELECT identityEmail, action, resourceName, recordedAt, source_ip
FROM AccessLog
WHERE resourceName IN ${RESOURCE_LIST_JSON}
  AND recordedAt > '2025-12-01 00:00:00'
ORDER BY recordedAt DESC
"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Query 3 now dynamically builds the resource list from the SOX-governed resources query result using jq.

Comment on lines +92 to +120
echo "--- Identities with approve permission ---"
query "sql" "
SELECT identity, resource, role
FROM (
MATCH {type: Identity, as: u}
.out('MEMBER_OF'){while: (\$depth < 3)}
.out('HAS_ROLE'){as: r}
.out('GRANTS'){where: (action = 'approve')}
.out('APPLIES_TO'){as: res}
RETURN u.email AS identity, res.name AS resource, r.name AS role
)
"

echo ""
echo "--- Identities with execute permission ---"
query "sql" "
SELECT identity, resource, role
FROM (
MATCH {type: Identity, as: u}
.out('MEMBER_OF'){while: (\$depth < 3)}
.out('HAS_ROLE'){as: r}
.out('GRANTS'){where: (action = 'execute')}
.out('APPLIES_TO'){as: res}
RETURN u.email AS identity, res.name AS resource, r.name AS role
)
"

echo ""
echo "SoD violation: identities appearing in BOTH lists for the same resource (carol@company.com on Payment-API)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

"Query 4: Separation of Duties Violation" is not dynamic. It runs two queries and then prints a hardcoded echo statement with the expected result. For a demonstration script, it's much more effective to compute the result dynamically. You can use jq to find the intersection of the results from the two queries.

Here is a suggested implementation:

APPROVERS=$(query "sql" "
SELECT identity, resource, role
FROM (
  MATCH {type: Identity, as: u}
        .out('MEMBER_OF'){while: (\$depth < 3)}
        .out('HAS_ROLE'){as: r}
        .out('GRANTS'){where: (action = 'approve')}
        .out('APPLIES_TO'){as: res}
  RETURN u.email AS identity, res.name AS resource, r.name AS role
)
")

EXECUTORS=$(query "sql" "
SELECT identity, resource, role
FROM (
  MATCH {type: Identity, as: u}
        .out('MEMBER_OF'){while: (\$depth < 3)}
        .out('HAS_ROLE'){as: r}
        .out('GRANTS'){where: (action = 'execute')}
        .out('APPLIES_TO'){as: res}
  RETURN u.email AS identity, res.name AS resource, r.name AS role
)
")

# Use jq to find violations
jq -n --argjson approvers "$APPROVERS" --argjson executors "$EXECUTORS" '
  (\$approvers | map({key: (.identity + "|" + .resource), value: .}) | from_entries) as \$approver_map
  | \$executors
  | map(
      select(\$approver_map[.identity + "|" + .resource])
      | "VIOLATION: \(.identity) on \(.resource) | approve via: \(\$approver_map[.identity + "|" + .resource].role) | execute via: \(.role)"
    )
  | .[]
' | sed 's/"//g'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Query 4 now captures both query results and computes the SoD violations dynamically using jq set intersection.

Comment on lines +128 to +152
echo "--- Step 1: Identities with granted permissions ---"
query "sql" "
SELECT DISTINCT identity, resource
FROM (
MATCH {type: Identity, as: u}
.out('MEMBER_OF'){while: (\$depth < 3)}
.out('HAS_ROLE'){}
.out('GRANTS'){}
.out('APPLIES_TO'){as: res}
RETURN u.email AS identity, res.name AS resource
)
ORDER BY identity
"

echo ""
echo "--- Step 2: Identities with recent access (last 90 days) ---"
query "sql" "
SELECT DISTINCT identityEmail
FROM AccessLog
WHERE recordedAt > '2025-12-06 00:00:00'
ORDER BY identityEmail
"

echo ""
echo "Dormant identities = Step 1 minus Step 2 (bob@company.com, frank@company.com, svc-backup@company.com)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

"Query 5: Dormant Access Detection" is not dynamic. It runs two queries and then prints a hardcoded echo statement with the expected result. This should be computed dynamically by finding the set difference between users with permissions and users with recent activity.

Here is a way to compute this dynamically using jq:

GRANTED_USERS=$(query "sql" "
SELECT DISTINCT u.email AS identity
FROM (
  MATCH {type: Identity, as: u}
        .out('MEMBER_OF'){while: (\$depth < 3)}
        .out('HAS_ROLE'){}
        .out('GRANTS'){}
        .out('APPLIES_TO'){as: res}
  RETURN u.email
)
")

RECENT_USERS=$(query "sql" "
SELECT DISTINCT identityEmail
FROM AccessLog
WHERE recordedAt > '2025-12-06 00:00:00'
")

echo "Dormant identities (have permissions but no recent access):"
jq -n --argjson granted "$GRANTED_USERS" --argjson recent "$RECENT_USERS" '
  (\$granted | map(.identity)) as \$g
  | (\$recent | map(.identityEmail)) as \$r
  | (\$g - \$r) | .[]
' | sed 's/"//g'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Query 5 now captures both query results and computes dormant identities via jq set difference ($granted - $recent).

Comment on lines +478 to +487
query "cypher" "
MATCH (u:Identity {email: 'alice@company.com'})
-[:MEMBER_OF*1..3]->(g:Group)
-[:HAS_ROLE]->(r:Role)
-[:GRANTS]->(p:Permission)
-[:APPLIES_TO]->(res:Resource)
RETURN res.name AS resource, p.action AS action,
r.name AS via_role, g.name AS via_group
ORDER BY resource, action
"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation plan specifies using cypher for several queries in the queries.sh script (e.g., Query 1, 2, 4, 7). However, the actual implementation in iam/queries/queries.sh and the Java/Python runners uses sql (with SQL MATCH) for all graph queries. The plan should be updated to match the implementation for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan doc is a historical artifact from before implementation. During implementation, we discovered that Cypher queries don't work over the PostgreSQL wire protocol, so all queries were changed to SQL MATCH for consistency across all three runners. Not updating the plan doc since it served its purpose.

Comment on lines +864 to +887
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1

- name: Set up Java
if: matrix.runner == 'java'
uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5.2.0
with:
java-version: '21'
distribution: 'temurin'

- name: Cache Maven repository
if: matrix.runner == 'java'
uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
with:
path: ~/.m2
key: ${{ runner.os }}-m2-iam-${{ hashFiles('iam/java/pom.xml') }}
restore-keys: ${{ runner.os }}-m2-iam-

- name: Set up Python
if: matrix.runner == 'python'
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: '3.12'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The commented versions for the GitHub Actions in the CI workflow plan are incorrect and outdated. This can be misleading for future maintenance.

  • actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd is v4.1.7, not v6.0.2.
  • actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 is v4.2.1, not v5.2.0.
  • actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 is v4.0.2, not v5.0.3.
  • actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 is v5.1.0, not v5.6.0.

Please update the comments to reflect the correct versions associated with the SHAs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SHAs and version comments in the plan doc were copied from the existing CI workflows in this repo. The actual .github/workflows/iam.yml file uses the same pinned SHAs as the other use case workflows. These are historical plan notes, not maintained documentation.

@claude
Copy link

claude bot commented Mar 6, 2026

Deleting test comment - actual review posted below

@claude
Copy link

claude bot commented Mar 6, 2026

PR Review - Part 1: Security and Correctness Issues

SECURITY (Medium): SQL injection in iam.py Query 3
Resource names are string-interpolated into SQL via f-string. Even though values come from the DB not user input, this is unsafe. Fix: use a single MATCH+JOIN query to get SOX access logs without string interpolation.

SECURITY (Low): Same in Java runQuery3SoxComplianceAudit
Manual quote-escaping with replace() is slightly safer but still string-concatenated SQL. The single-query approach resolves both.

CORRECTNESS: queries.sh Query 3 hardcodes SOX resource names
Java and Python runners dynamically query via GOVERNED_BY edge, but shell version hardcodes Production-DB, Payment-API, Audit-System. Inconsistent and fragile. Single MATCH+JOIN query fixes all three runners.

CORRECTNESS: Query 7 MATCH filter may miss indirect members
The where:(name=Platform-Admins) inside the MEMBER_OF traversal step prunes intermediate nodes - Bob who travels via Contractors->Engineering->Platform-Admins would be excluded. Filter should only apply to the final matched node. Suggest verifying against a live instance.

Generated with Claude Code https://claude.com/claude-code

@claude
Copy link

claude bot commented Mar 6, 2026

PR Review - Part 2: Runtime Risk, Minor Issues, and Positives

POTENTIAL RUNTIME ISSUE: vectorNeighbors in ORDER BY (Query 6)
vectorNeighbors returns a k-NN result set, not a scalar distance per row. Using it in ORDER BY may be invalid or produce unexpected results in 26.3.1. If it fails at runtime, Query 6 would need to return results from the vector index directly and rank in the client.

MINOR: Permission type has no unique index on action
All other entity types have unique indexes on their identity property. Adding a unique index on Permission.action would be consistent and prevent silent edge fan-out if 02-data.sql is re-applied.

MINOR: Hardcoded 90-day cutoff date needs comment
The date 2025-12-06 used in Queries 3 and 5 is a fixed demo date. A brief comment explaining it is the demo dataset cutoff (not a live 90-day window) would help readers.

What works well:

  • Reserved word avoidance is correct: identityType not type, recordedAt not timestamp
  • Group backtick-quoting is consistent across all SQL, Java, and Python files
  • Engineered scenarios (Bob shadow admin, Carol SoD, Frank dormant) are realistic and well-designed
  • CI workflow follows all existing conventions: pinned SHAs, --no-transfer-progress, fail-fast:false, always() teardown
  • Python PG-protocol runner cleanly mirrors Java conventions (try_run/print_header)
  • Design doc and implementation plan in docs/plans/ are thorough

Generated with Claude Code https://claude.com/claude-code

@mergify
Copy link
Contributor

mergify bot commented Mar 6, 2026

🧪 CI Insights

Here's what we observed from your CI run for d518155.

🟢 All jobs passed!

But CI Insights is watching 👀

robfrank and others added 2 commits March 6, 2026 23:02
The `{while: ...}{as: target}` two-block MATCH pattern causes a parse
error in ArcadeDB 26.3.1. Refactored to two-step approach: start from
Platform-Admins group going outward for permissions, then inward for
member identities. Also removed DISTINCT from Java to avoid RemoteDatabase
serialization issue.

Verified all 7 queries pass on all 3 runners (curl, Java, Python).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Query 3: build SOX resource list from prior query via jq
- Query 4: compute SoD violations via jq set intersection
- Query 5: compute dormant identities via jq set difference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@robfrank
Copy link
Contributor Author

robfrank commented Mar 6, 2026

Review Handling Summary

# File Comment Action Rationale
1 IdentityAccessManagement.java:144 SQL injection in Query 3 — use IN :resources Pushed back ArcadeDB SQL doesn't support IN :paramName with list parameter binding. Data comes from our own prior query result, not user input.
2 iam.py:102 SQL injection in Query 3 — use ANY(%s) Pushed back ANY(%s) is PostgreSQL-specific syntax. ArcadeDB's PG wire protocol uses its own SQL parser with IN [...] (square brackets). The suggested change would fail at parse time.
3 queries.sh:84 Query 3 hardcodes SOX resource list Fixed in 3cade2d Now dynamically builds resource list from prior query result via jq.
4 queries.sh:120 Query 4 prints hardcoded SoD violation Fixed in 3cade2d Now computes violations dynamically via jq set intersection.
5 queries.sh:152 Query 5 prints hardcoded dormant identities Fixed in 3cade2d Now computes dormant set dynamically via jq set difference ($granted - $recent).
6 2026-03-06-iam.md:487 Plan doc mentions Cypher but impl uses SQL MATCH Not updating Plan doc is a historical artifact. Cypher was changed to SQL MATCH during implementation because Cypher isn't available over the PG wire protocol.
7 2026-03-06-iam.md:887 Plan doc has incorrect action SHA version comments Not updating Historical plan notes. The actual CI workflow uses the same pinned SHAs as all other use case workflows in the repo.

All 3 runners (curl, java, python) pass CI ✅ and were verified locally before push.

@claude
Copy link

claude bot commented Mar 6, 2026

test

@claude
Copy link

claude bot commented Mar 6, 2026

ArcadeDB Use Cases

@claude
Copy link

claude bot commented Mar 6, 2026

Code Review: feat(iam) - Identity and Access Management Use Case. See full review in follow-up comments.

@claude
Copy link

claude bot commented Mar 6, 2026

Bug 1: Dormant detection surfaces Bob too, not just Frank. The 90-day cutoff is 2025-12-06. Bob last accessed on 2025-09-15 (before the cutoff) and has permissions via Contractors -> Engineering -> Platform-Admins -> Admin -> admin. He will appear as dormant alongside Frank. The test plan should say both bob@company.com and frank@company.com are found, not just frank@company.com.

@claude
Copy link

claude bot commented Mar 6, 2026

Disregard the test/partial comments above - full review below.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 6, 2026

BUGS: (1) Dormant detection - the 90-day cutoff is 2025-12-06, Bob last accessed 2025-09-15 and has permissions via Contractors->Engineering->Platform-Admins->Admin, so Bob is ALSO dormant, not just Frank. Test plan needs updating. (2) INHERITS_FROM edge is defined in schema+data but zero queries traverse it - either use it or remove it. POTENTIAL ISSUES: (3) Query 3 dynamic SQL has inconsistent escaping: Java escapes single quotes correctly but Python and Shell do not. (4) vectorNeighbors() in ORDER BY is not documented as supported in 26.3.1 quirks - confirm via CI and add a note to CLAUDE.md. (5) iam.yml line 28 comments actions/checkout as v6.0.2 but v4.x is the latest - misleading annotation. MINOR: (6) No index on AccessLog.recordedAt despite Queries 3+5 filtering on it. (7) Permission.action has no UNIQUE index so re-running setup.sh could create duplicate Permission nodes. GOOD: Follows CLAUDE.md conventions throughout, clean Java 21 record usage, correct psycopg autocommit, fail-fast:false on matrix, well-engineered test scenarios (shadow admin, SoD). Reviewed by Claude Sonnet 4.6.

…ssion.action

- INHERITS_FROM was defined in schema/data but no query traversed it
- Permission.action now has a UNIQUE index, consistent with other entity types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@robfrank
Copy link
Contributor Author

robfrank commented Mar 6, 2026

Review Handling — Claude Sonnet 4.6 Review

# Comment Action Rationale
1 Dormant detection should include Bob Already correct Implementation already finds bob, frank, and svc-backup as dormant. Verified locally — all 3 runners output the correct set.
2 INHERITS_FROM defined but never traversed Fixed in d518155 Removed edge type from schema, data, and README. No query used it.
3 Query 3 dynamic SQL inconsistent escaping Pushed back Resource names come from our own prior query, not user input. Java already escapes quotes; shell now uses jq (from prior fix). ArcadeDB doesn't support parameterized list binding.
4 vectorNeighbors in ORDER BY undocumented Pushed back Confirmed working — all 3 runners pass CI and local testing. Query 6 returns correct results.
5 iam.yml action SHA version comments misleading Pushed back SHAs match all other workflows in the repo. Already addressed in prior review round.
6 No index on AccessLog.recordedAt Pushed back 15-row demo dataset. Adding indexes for performance here is over-engineering.
7 Permission.action has no UNIQUE index Fixed in d518155 Added CREATE INDEX IF NOT EXISTS ON Permission (action) UNIQUE for consistency with other entity types.

All 3 runners verified locally after changes.

@claude
Copy link

claude bot commented Mar 6, 2026

test

@claude
Copy link

claude bot commented Mar 6, 2026

PR Review posted via Claude Code - see full review below (testing comment posting)

@claude
Copy link

claude bot commented Mar 6, 2026

PR Review Part 1 of 3 - HIGH PRIORITY ISSUES:

  1. SQL injection in iam.py Query 3 (SOX Audit)

The Python version builds the SQL IN clause by directly f-string interpolating values fetched from the database with no escaping. A resource name containing a single quote would break the query. The Java counterpart does at least use .replace() to escape single quotes, but the Python version has no such guard. Fix: use psycopg placeholders or apply the same single-quote escaping the Java version uses. The same dynamic-SQL pattern in queries.sh Query 3 (RESOURCE_LIST shell variable) carries similar risk.

  1. vectorNeighbors in ORDER BY may not work as intended

Query 6 uses vectorNeighbors in an ORDER BY clause. Per CLAUDE.md, vectorNeighbors returns a ranked neighbor set, not a scalar distance score. Using it in ORDER BY may silently return unordered results or error at runtime. It is typically used in the FROM position. Please verify this executes correctly end-to-end; if not, restructure as: SELECT email, department, identityType FROM vectorNeighbors('Identity[access_pattern_vec]', [...], 10) WHERE identityType = 'employee'

@claude
Copy link

claude bot commented Mar 6, 2026

PR Review Part 2 of 3 - MEDIUM PRIORITY + MINOR:

MEDIUM PRIORITY:

  1. MEMBER_OF edge type: schema doc vs. actual data - The design doc documents MEMBER_OF as Identity -> Group, but the data file creates Group->Group edges (Contractors -> Engineering -> Platform-Admins). This is the correct modeling choice for nested group memberships (the core of Bob's shadow admin chain). The schema documentation should reflect Identity|Group -> Group to match the actual data.

  2. Missing UNIQUE constraint on Permission.action - Every other entity (Identity, Group, Role, Resource, Policy) has a UNIQUE index on its natural key, but Permission.action does not. Re-running setup.sh could create duplicate read/admin/etc. permission nodes, silently multiplying traversal results. Suggest adding: CREATE INDEX IF NOT EXISTS ON Permission (action) UNIQUE

  3. Hardcoded 90-day cutoff - The dormant access queries hardcode 2025-12-06 00:00:00 as the 90-days-ago threshold. This will silently become stale as time passes. A README note clarifying that dates are hardcoded relative to 2026-03-06 would set expectations clearly.

MINOR:

  1. Python connection uses try/finally conn.close() rather than the context manager idiom. Not a bug, just style.

  2. ARCADEDB_PG_PORT is not in the CI env block. The default of 5432 matches docker-compose, so it works - but explicit is better than implicit for discoverability alongside the other ARCADEDB_* vars.

  3. docker compose down without -v leaves volumes behind. Not an issue on ephemeral CI runners, but affects local developer teardown.

@claude
Copy link

claude bot commented Mar 6, 2026

PR Review Part 3 of 3 - WHAT IS WELL DONE:

  • tryRun()/printHeader() in Java, try_run()/print_header() in Python - exactly mirrors the established convention across all use cases.
  • Action-pinned SHAs throughout - consistent with the repo security posture.
  • Three-runner matrix with fail-fast: false so a Python failure does not mask Java results.
  • Two-step approach for SoD and dormant detection (shell jq set difference, Java HashSet difference) is pragmatic given ArcadeDB 26.3.1 subquery limitations and well-documented in comments.
  • Consistent backtick-quoting of the Group type everywhere it is needed.
  • Engineered data scenarios that guarantee non-trivial results per query (Bob=shadow admin, Carol=SoD+anomaly, Frank=dormant).
  • Dependabot configured for all three new package ecosystems (Maven, pip, Docker).

(Review generated with Claude Code)

@claude
Copy link

claude bot commented Mar 6, 2026

NOTE: The first comment in this review thread (ID issuecomment-4014489144) was a test comment and can be ignored. The actual review is in the three comments that follow it.

@robfrank robfrank merged commit ac2d831 into main Mar 7, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant