Skip to content

Conversation

knudtty
Copy link
Contributor

@knudtty knudtty commented Oct 10, 2025

No description provided.

Copy link

changeset-bot bot commented Oct 10, 2025

🦋 Changeset detected

Latest commit: 4d36473

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/common-utils Minor
@hyperdx/app Minor
@hyperdx/api Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

vercel bot commented Oct 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview Comment Oct 10, 2025 8:37pm

Copy link

claude bot commented Oct 10, 2025

Code Review: JSON Filter Keys via Dynamic Column Metadata

Thanks for this PR! This is an interesting approach to re-enable JSON filter keys by querying ClickHouse system tables instead of scanning the actual data. I've identified several issues and concerns that should be addressed:


🔴 Critical Issues

1. Hardcoded Database Name Assumption

const decodedKey = decodeURIComponent(row.encoded_key);
keys.push({
  key: decodedKey.slice(
    `${column}.`.length,
    decodedKey.lastIndexOf('.dynamic_structure'),
  ),
  chType: 'String',
});

Problem: The code assumes the substream format is exactly {column}.{key}.dynamic_structure, but this may not be true for all ClickHouse versions or configurations. The string slicing is brittle and could fail silently.

Recommendation:

  • Add validation to check if the expected format is present
  • Add error handling for malformed substream names
  • Consider using a regex pattern to extract the key more robustly

2. Incomplete Filter Key Escaping in Frontend

keys: [
  key
    .split('.')
    .map(v => `\`${v}\``)
    .join('.'),
],

Problem: This escaping logic only handles dots in key names, but doesn't account for:

  • Keys that already contain backticks
  • Keys with special characters that need escaping
  • Empty segments after splitting

Recommendation:

keys: [
  key
    .split('.')
    .filter(v => v.length > 0)  // Filter empty segments
    .map(v => `\`${v.replace(/`/g, '\\`')}\``)  // Escape existing backticks
    .join('.'),
],

3. Missing Test Coverage

The getJSONKeys function has no tests in packages/common-utils/src/__tests__/metadata.test.ts, despite being a critical function that was previously disabled (HDX-2480).

Recommendation: Add comprehensive tests covering:

  • Normal operation with valid substreams
  • Edge cases (empty results, malformed substreams)
  • Multiple nested JSON keys
  • Keys with special characters

⚠️ Performance & Reliability Concerns

4. Unbounded Query on System Tables

clickhouse_settings: {
  max_rows_to_read: '0',  // Unlimited!
  read_overflow_mode: 'break',
  ...this.getClickHouseSettings(),
},

Problem: Setting max_rows_to_read: '0' means unlimited rows. For large databases with many parts, this could:

  • Consume significant memory
  • Take a long time to execute
  • Impact overall system performance

Recommendation:

  • Add a reasonable limit (e.g., max_rows_to_read: '10000')
  • Add a LIMIT clause to the query itself
  • Consider filtering only active parts: AND active = 1

5. Potential Duplicate Keys

The query uses SELECT DISTINCT on encoded_key, but multiple parts can have the same column structure. While DISTINCT handles this, you might want to:

  • Limit to only active parts
  • Consider using system.columns instead of system.parts_columns for better performance

6. ILIKE Pattern Matching Performance

WHERE ... AND encoded_key ILIKE '%dynamic_structure%'

Problem: ILIKE with leading wildcard (%) prevents index usage and requires a full scan.

Recommendation: Since you're already filtering by database, table, and column, consider:

  • Using LIKE instead (case-sensitive, faster)
  • Adding additional filters to reduce the result set before pattern matching

🟡 Code Quality Issues

7. Lost Parameter: maxKeys

The old implementation had a maxKeys parameter that was passed in but never used (default 1000). The new implementation completely ignores it.

Recommendation: Either:

  • Add LIMIT ${maxKeys} to the query
  • Remove the parameter from the function signature if not needed

8. Type Assumption

chType: 'String',

The new implementation hardcodes all types as 'String', whereas the old implementation extracted actual types from ClickHouse. This could cause issues downstream if type information is used for query construction or validation.

Recommendation:

  • Document why all types are treated as String
  • Or, consider querying actual type information from ClickHouse

9. Missing Changeset Details

The changeset file .changeset/short-geese-beg.md lacks a description of what changed and why. This makes it harder for users to understand the impact.

Recommendation: Add a meaningful description:

---
"@hyperdx/common-utils": minor
"@hyperdx/app": minor
---

Re-enable JSON filter keys by querying ClickHouse system.parts_columns metadata instead of scanning table data. This improves performance for large datasets while restoring the ability to filter on nested JSON fields.

📋 Additional Recommendations

10. Consider Query Validation

Since you're querying system tables now instead of actual data, there's a risk of:

  • Keys appearing in metadata but not actually present in recent data
  • Stale metadata from old/inactive parts

Consider adding documentation or a refresh mechanism.

11. Error Handling

Add try-catch and meaningful error messages:

.then(d => {
  const keys: { key: string; chType: string }[] = [];
  try {
    for (const row of d.data) {
      const decodedKey = decodeURIComponent(row.encoded_key);
      const startIdx = `${column}.`.length;
      const endIdx = decodedKey.lastIndexOf('.dynamic_structure');
      
      if (endIdx === -1 || endIdx <= startIdx) {
        console.warn(`Unexpected substream format: ${row.encoded_key}`);
        continue;
      }
      
      keys.push({
        key: decodedKey.slice(startIdx, endIdx),
        chType: 'String',
      });
    }
  } catch (error) {
    console.error('Error parsing JSON keys from substreams:', error);
    throw error;
  }
  return keys;
})

✅ Positive Aspects

  • Good approach to avoid expensive data scans
  • Properly maintains caching mechanism
  • Removes the temporary disable workaround for HDX-2480
  • Frontend escaping logic is a good addition

🎯 Summary

Must Fix Before Merge:

  1. Add robust parsing with validation for substream format
  2. Add test coverage for getJSONKeys
  3. Add proper limits to prevent unbounded queries
  4. Fix frontend key escaping to handle edge cases

Should Consider:
5. Type information loss
6. ILIKE performance
7. Active parts filtering

Nice to Have:
8. Better changeset description
9. Error handling improvements

Overall, this is a promising approach but needs hardening before production use. The switch from data scanning to metadata querying is sound, but the implementation needs more defensive programming.

Copy link
Contributor

E2E Test Results

1 test failed • 23 passed • 3 skipped • 416s

Status Count
✅ Passed 23
❌ Failed 1
⚠️ Flaky 1
⏭️ Skipped 3

View full report →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant