-
Notifications
You must be signed in to change notification settings - Fork 4
ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647
Conversation
…own. This is pure refactoring with no change of functionality.
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
|
Explanations about the breakup here: https://www.loom.com/share/afa48744aa0441628ec182a2147e8321 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThis PR introduces new utility modules for exporting Roam content to various formats. It creates Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
apps/roam/src/utils/getExportTypes.ts (1)
224-241: TODO comment indicates future consolidation opportunity.The comment at line 224 notes that
treeNodeToMarkdownlogic should be reused with the markdown export. SincepageToMarkdown.tsnow exists with atoMarkdownfunction, this could potentially be addressed in a follow-up PR to reduce duplication between the PDF and Markdown exports.Would you like me to open an issue to track consolidating the PDF export's
treeNodeToMarkdownwith the existingtoMarkdownfrompageToMarkdown.ts?apps/roam/src/utils/exportUtils.ts (1)
8-17: Consider performance implications ofuniqJsonArray.The implementation uses
JSON.stringifywith sorted entries for deduplication, which works correctly but has O(n * m log m) complexity where m is the number of keys per object. For large arrays with complex objects, this could be slow.For the current use case (filtering relation data), this should be acceptable, but worth noting if used with larger datasets.
apps/roam/src/utils/pageToMarkdown.ts (1)
96-110: YAML frontmatter escaping is limited to the text field.Line 101-102 correctly escapes quotes in the
textfield, but other dynamic fields (line 104) are inserted without escaping. Values containing YAML special characters (:,\n,#, etc.) could produce malformed YAML.Since this is for user-controlled exports, it's not a security issue, but could cause parsing errors when importing the exported markdown.
🔎 Optional: Comprehensive YAML escaping
if (capt === "text") { - // Wrap title in quotes and escape additional quotes - const escapedText = result[capt].toString().replace(/"/g, '\\"'); - return `"${escapedText}"`; + const escapedText = result[capt].toString().replace(/"/g, '\\"'); + return `"${escapedText}"`; } - return result[capt].toString(); + const value = result[capt].toString(); + // Quote values containing YAML special characters + if (/[:\n#[\]{}|>&*!?]/.test(value) || value.includes('"')) { + return `"${value.replace(/"/g, '\\"')}"`; + } + return value;
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
apps/roam/src/utils/exportUtils.tsapps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.ts
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/main.mdc)
**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefertypeoverinterfacein TypeScript
Use explicit return types for functions
Avoidanytypes when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**/*.{js,ts,tsx,jsx,json}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Prefer existing dependencies from package.json when working on the Roam Research extension
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**/*.{ts,tsx,jsx,js,css,scss}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
apps/roam/**
📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)
Implement the Discourse Graph protocol in the Roam Research extension
Files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
🧠 Learnings (10)
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-07-13T16:47:14.352Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-07-13T16:47:14.352Z
Learning: In the discourse-graph codebase, types.gen.ts contains automatically generated database function type definitions that may have reordered signatures between versions. This reordering is expected behavior from the code generation process and should not be flagged as an issue requiring fixes.
Applied to files:
apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-07-21T14:22:20.752Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 291
File: packages/database/supabase/functions/create-space/index.ts:0-0
Timestamp: 2025-07-21T14:22:20.752Z
Learning: In the discourse-graph codebase, types.gen.ts is not accessible from Supabase edge functions, requiring duplication of types and utilities when needed in the edge function environment at packages/database/supabase/functions/.
Applied to files:
apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.tsapps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-11-25T00:52:41.934Z
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,jsx,js,css,scss} : Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension
Applied to files:
apps/roam/src/utils/getExportTypes.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.
Applied to files:
apps/roam/src/utils/getExportTypes.tsapps/roam/src/utils/pageToMarkdown.ts
📚 Learning: 2025-12-07T20:54:20.007Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Applied to files:
apps/roam/src/utils/exportUtils.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.
Applied to files:
apps/roam/src/utils/exportUtils.ts
🧬 Code graph analysis (2)
apps/roam/src/utils/getExportTypes.ts (4)
apps/roam/src/utils/types.ts (1)
Result(42-46)apps/roam/src/utils/exportUtils.ts (1)
getPageData(19-67)apps/roam/src/utils/getExportSettings.ts (1)
getExportSettings(112-125)apps/roam/src/utils/pageToMarkdown.ts (1)
pageToMarkdown(212-351)
apps/roam/src/utils/exportUtils.ts (1)
apps/roam/src/utils/types.ts (1)
Result(42-46)
🪛 ast-grep (0.40.3)
apps/roam/src/utils/exportUtils.ts
[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (10)
apps/roam/src/utils/getExportTypes.ts (3)
14-22: Clean modular imports from new utility files.The refactoring correctly imports the required utilities from the new modules. This aligns well with the PR's goal of factoring out utility functions.
84-132: Markdown export callback cleanly refactored.The callback properly delegates to
pageToMarkdownwhile preserving the progress update pattern and sequential processing. The early guard at line 85 improves robustness.
55-79: Incorrect claim about Neo4j export parameter.The review incorrectly states that Neo4j (line 152) includes
isExportDiscourseGraph, but it doesn't. The actual pattern shows: JSON and Neo4j both omit the parameter (filtering results to discourse nodes), while Markdown and PDF include it (exporting all content). This appears to be intentional design: structured data exports filter by node type, while document exports preserve all content.Likely an incorrect or invalid review comment.
apps/roam/src/utils/exportUtils.ts (2)
122-129: Good link type handling with appropriate fallback.The
toLinkfunction correctly handles the three link types (wikilinks, alias, roam url) with a sensible plain filename fallback. The extension removal for wikilinks and roam URLs matches expected Markdown/wiki conventions.
131-153: Well-structured recursive tree conversion utilities.
pullBlockToTreeNodecorrectly maps Roam's PullBlock structure to the TreeNode interface with appropriate defaults.collectUidsprovides a clean recursive UID collection. Both functions are idiomatic and readable.apps/roam/src/utils/pageToMarkdown.ts (5)
24-34: Clear regex patterns for Roam embed syntax.The
MATCHES_NONEpattern (/$.+^/) is a clever approach to create a regex that never matches, avoiding conditional logic. The embed regexes correctly capture the 9-10 character block UIDs. Good documentation comments explain the syntax being matched.
42-75: Clean conditional discourse context handling.The function uses early returns appropriately and cleanly handles the optional
appendRefNodeContextfeature. The logic for finding and appending referenced discourse nodes is well-structured.
142-161: Comprehensive text processing with proper ordering.The text processing chain correctly handles:
- Embeds and embed-children (with recursion)
- Block references
- TODO/DONE checkbox conversion
- Roam italic syntax conversion
- Code block formatting
The use of
MATCHES_NONEto conditionally skip regex replacements is an elegant pattern.
280-348: Well-structured content assembly with clear section formatting.The template literal construction correctly builds the markdown document with:
- Frontmatter
- Main content from tree children
- Optional Discourse Context section
- Optional References section
The conditional sections use appropriate guards and formatting.
253-264: Verify the escaping behavior ofnormalizePageTitlefrom the roamjs-components library.The concern about query injection via string interpolation at line 256 is theoretically valid—if
normalizePageTitledoesn't escape quotes in page titles, a title likefoo"barwould break the Datalog syntax. However, the implementation ofnormalizePageTitlefrom the externalroamjs-components(v0.85.7) library cannot be verified from public sources. The pattern of string interpolation is consistent throughout the codebase for various user inputs, and no documented security issues exist for this pattern in Roam API calls. Clarify whethernormalizePageTitleproperly escapes special characters (particularly quotes) for use in Datalog string contexts, or consider using Datalog's parameter binding (:in $clause) as a safer alternative if available.
| const allResults = results || []; | ||
|
|
||
| if (isExportDiscourseGraph) return allResults as DiscourseExportResult[]; | ||
|
|
||
| const matchedTexts = new Set(); | ||
| return allNodes.flatMap((n) => | ||
| (allResults | ||
| ? allResults.flatMap((r) => | ||
| Object.keys(r) | ||
| .filter((k) => k.endsWith(`-uid`) && k !== "text-uid") | ||
| .map((k) => ({ | ||
| ...r, | ||
| text: r[k.slice(0, -4)].toString(), | ||
| uid: r[k] as string, | ||
| })) | ||
| .concat({ | ||
| text: r.text, | ||
| uid: r.uid, | ||
| }), | ||
| ) | ||
| : ( | ||
| window.roamAlphaAPI.q( | ||
| "[:find (pull ?e [:block/uid :node/title]) :where [?e :node/title _]]", | ||
| ) as [Record<string, string>][] | ||
| ).map(([{ title, uid }]) => ({ | ||
| text: title, | ||
| uid, | ||
| })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant truthiness check on allResults.
Line 28 assigns const allResults = results || [], ensuring allResults is always an array. The ternary at line 34 (allResults ? ... : ...) will therefore always take the truthy branch, making the fallback Roam query (lines 48-55) unreachable dead code.
🔎 Suggested fix
Either remove the dead code branch, or if the fallback is intentional for when results is explicitly undefined, change line 28 to preserve that distinction:
- const allResults = results || [];
+ const allResults = results;
- if (isExportDiscourseGraph) return allResults as DiscourseExportResult[];
+ if (isExportDiscourseGraph) return (allResults || []) as DiscourseExportResult[];Or simply remove lines 48-55 if they're no longer needed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const allResults = results || []; | |
| if (isExportDiscourseGraph) return allResults as DiscourseExportResult[]; | |
| const matchedTexts = new Set(); | |
| return allNodes.flatMap((n) => | |
| (allResults | |
| ? allResults.flatMap((r) => | |
| Object.keys(r) | |
| .filter((k) => k.endsWith(`-uid`) && k !== "text-uid") | |
| .map((k) => ({ | |
| ...r, | |
| text: r[k.slice(0, -4)].toString(), | |
| uid: r[k] as string, | |
| })) | |
| .concat({ | |
| text: r.text, | |
| uid: r.uid, | |
| }), | |
| ) | |
| : ( | |
| window.roamAlphaAPI.q( | |
| "[:find (pull ?e [:block/uid :node/title]) :where [?e :node/title _]]", | |
| ) as [Record<string, string>][] | |
| ).map(([{ title, uid }]) => ({ | |
| text: title, | |
| uid, | |
| })) | |
| const allResults = results; | |
| if (isExportDiscourseGraph) return (allResults || []) as DiscourseExportResult[]; | |
| const matchedTexts = new Set(); | |
| return allNodes.flatMap((n) => | |
| (allResults | |
| ? allResults.flatMap((r) => | |
| Object.keys(r) | |
| .filter((k) => k.endsWith(`-uid`) && k !== "text-uid") | |
| .map((k) => ({ | |
| ...r, | |
| text: r[k.slice(0, -4)].toString(), | |
| uid: r[k] as string, | |
| })) | |
| .concat({ | |
| text: r.text, | |
| uid: r.uid, | |
| }), | |
| ) | |
| : ( | |
| window.roamAlphaAPI.q( | |
| "[:find (pull ?e [:block/uid :node/title]) :where [?e :node/title _]]", | |
| ) as [Record<string, string>][] | |
| ).map(([{ title, uid }]) => ({ | |
| text: title, | |
| uid, | |
| })) |
🤖 Prompt for AI Agents
In apps/roam/src/utils/exportUtils.ts around lines 28 to 55, the code does
`const allResults = results || []` then uses a ternary `allResults ? ... : ...`,
making the false branch unreachable; either remove the unreachable Roam query
fallback branch (lines ~48-55) or preserve the distinction between undefined and
empty array by changing the assignment to keep undefined when results is
undefined (e.g., `const allResults = results;`), then update the ternary
accordingly so the fallback executes only when results is truly undefined.
| if (!nodeFormat) return title; | ||
| const regex = new RegExp( | ||
| `^${nodeFormat | ||
| .replace(/\[/g, "\\[") | ||
| .replace(/]/g, "\\]") | ||
| .replace("{content}", "(.*?)") | ||
| .replace(/{[^}]+}/g, "(?:.*?)")}$`, | ||
| ); | ||
| return regex.exec(title)?.[1] || title; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find DiscourseNode type definition and format field
rg -n "type DiscourseNode|interface DiscourseNode" -A 20Repository: DiscourseGraphs/discourse-graph
Length of output: 22213
🏁 Script executed:
# Check how format patterns are created/validated in the codebase
rg -n "\.format\s*=" apps/roam/src -B 2 -A 2Repository: DiscourseGraphs/discourse-graph
Length of output: 57
🏁 Script executed:
# Look at the full context of exportUtils.ts around lines 79-87
cat -n apps/roam/src/utils/exportUtils.ts | head -100 | tail -50Repository: DiscourseGraphs/discourse-graph
Length of output: 1584
🏁 Script executed:
# Check if there's any validation of format field when it's set/saved
rg -n "format" apps/roam/src -l | head -20Repository: DiscourseGraphs/discourse-graph
Length of output: 933
🏁 Script executed:
# Search for format field validation or sanitization
rg -n "format.*validate|validateFormat|format.*check" apps/roam/src -B 2 -A 2Repository: DiscourseGraphs/discourse-graph
Length of output: 1338
🏁 Script executed:
# Check DiscourseNodeConfigPanel to see format field editing
cat -n apps/roam/src/components/settings/DiscourseNodeConfigPanel.tsx | head -200Repository: DiscourseGraphs/discourse-graph
Length of output: 8237
🏁 Script executed:
# Search for other uses of format field and regex construction
rg -n "getContentFromNodes|format.*RegExp|new RegExp" apps/roam/src/utilsRepository: DiscourseGraphs/discourse-graph
Length of output: 1790
🏁 Script executed:
# Check if there's any try-catch or error handling for regex in similar code
rg -n "try.*RegExp|RegExp.*catch" apps/roam/src -B 2 -A 2Repository: DiscourseGraphs/discourse-graph
Length of output: 57
🏁 Script executed:
# Find the validate function in NodeConfig
rg -n "const validate|function validate" apps/roam/src/components/settings/NodeConfig.tsx -B 5 -A 10Repository: DiscourseGraphs/discourse-graph
Length of output: 506
🏁 Script executed:
# Check getDiscourseNodeFormatExpression since it also processes format
cat -n apps/roam/src/utils/getDiscourseNodeFormatExpression.tsRepository: DiscourseGraphs/discourse-graph
Length of output: 544
🏁 Script executed:
# Look for format validation logic more broadly
rg -n "validate.*format|format.*valid" apps/roam/src -B 2 -A 5Repository: DiscourseGraphs/discourse-graph
Length of output: 2231
🏁 Script executed:
# Check if format field has any UI constraints or warnings
cat -n apps/roam/src/components/settings/NodeConfig.tsx | grep -A 30 -B 5 "format"Repository: DiscourseGraphs/discourse-graph
Length of output: 7712
Use the proper format escaping function to prevent ReDoS.
The getContentFromNodes function at lines 79-87 constructs a regex from the user-editable nodeFormat field with incomplete escaping. Unlike getDiscourseNodeFormatExpression which properly escapes regex metacharacters ([, ], ?, ., +), this code only escapes brackets, leaving other quantifiers unescaped. A malformed format like (a+)+b would create a regex with nested quantifiers, causing exponential backtracking.
Rather than adding a try-catch, replace the custom escaping with getDiscourseNodeFormatExpression, which already handles this safely.
🧰 Tools
🪛 ast-grep (0.40.3)
[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🤖 Prompt for AI Agents
In apps/roam/src/utils/exportUtils.ts around lines 79-87, the code builds a
RegExp from user-editable nodeFormat but only escapes brackets, leaving other
metacharacters and causing ReDoS risk; replace this custom escaping and regex
construction with the existing getDiscourseNodeFormatExpression utility that
properly escapes all regex metacharacters, then use the returned expression to
exec against title (preserve the same fallback to title when no match). Ensure
you import getDiscourseNodeFormatExpression if not already imported and remove
the manual replace chain so the format is safely escaped before creating/using
the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, methodology question: On the one hand, we are saying we should have only one logical change per PR. I tried to make this PR do only the refactor, nothing else, so that you could see it as innocuous and focus on the substance in the other PRs.
On the other hand, moving code draws the attention to issues with previously existing code. Coderabbit pointed this out in its comments to 1032, and I made (a subset of) this fix in that PR originally (a subset because I was following pattern) and then moved it to 1213 instead of 1211 so that 1211 would not have unrelated changes.
(Accessorily, maybe we should do the full fix in both places.)
So the question: When we do a refactoring, and try to make it into a single PR for clarity, is fixing issues with old code part of the same task or an extra task? Honestly, I think it depends; I have argued earlier that it's too much trouble to revert anything done automatically by the tooling. But it's also true that it muddies the refactoring.
I don't care that much either way, but I'm curious about which rule I'm expected to follow here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maparent, this is a great question that highlights a real tension in software engineering practice. Here's my perspective on the tradeoffs:
Arguments for keeping refactoring PRs pure:
- Easier to review and understand the structural changes
- Clearer git history with distinct "what" and "why"
- Can be reverted independently if needed
- Reduces cognitive load for reviewers
Arguments for fixing issues during refactoring:
- Avoids touching the same code twice
- Issues are fresh in mind when you're already working with the code
- More efficient overall (fewer PRs, fewer context switches)
- Prevents shipping known issues in the interim
Practical heuristics I'd suggest:
-
Security or critical bugs: Fix immediately in the same PR. ReDoS vulnerabilities like the one I flagged fall into this category - if you're moving the code anyway, fixing it is the responsible choice.
-
Minor improvements or style issues: Keep separate unless trivial (e.g., automated formatting).
-
Structural issues revealed by the refactor: This is where judgment matters. If the refactoring exposes that the code fundamentally doesn't work right, fixing it in the same PR makes sense. If it's just "this could be better," separate PR.
-
Team conventions: Your retrieved learning shows you "take a focused approach to PR scope," which is a good default. However, pragmatism should win when the overhead of separation outweighs the benefits.
For this specific case: The ReDoS issue is a security concern that exists in code you're moving. I'd recommend fixing it in PR 1213 (or this PR if you consolidate) rather than leaving it unfixed. You could also note in the PR description "refactor + fix ReDoS in getContentFromNodes" so reviewers know to expect both.
What does your team prefer?
🧠 Learnings used
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in the discourse-graph codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-08-30T17:03:26.927Z
Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45
Timestamp: 2025-05-20T03:11:07.917Z
Learning: When reviewing SQL migrations in this codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I forgot to tag @mdroidian that question was addressed to him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @CodeRabbit good point about the security implications!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maparent, you're welcome! Security issues like ReDoS can be subtle but have real impact, especially in user-facing tools where format strings are configurable. Since you're already touching this code in the refactor, addressing it now makes sense.
If you do decide to fix it in this PR or PR 1213, using getDiscourseNodeFormatExpression (which already has the proper escaping logic) should make the fix straightforward. Let me know if you'd like me to review the fix once it's in place!
🧠 Learnings used
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-08-30T17:03:26.927Z
Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality
Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I forgot to tag @mdroidian that question was addressed to him.
This is probably best decided on a case by case basis depending on severity, effort, number of changes, etc.
Use your best judgement and just be sure to document the rationale with an inline GitHub comment.
| return name.length > maxFilenameLength | ||
| ? `${name.substring( | ||
| 0, | ||
| Math.ceil((maxFilenameLength - 3) / 2), | ||
| )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}` | ||
| : name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: Very small maxFilenameLength values.
If maxFilenameLength is less than or equal to 3, the calculation (maxFilenameLength - 3) / 2 could result in 0 or negative values, potentially producing unexpected output. Consider adding a guard or minimum value check.
🔎 Suggested guard
+ const effectiveMaxLength = Math.max(maxFilenameLength, 4);
return name.length > maxFilenameLength
? `${name.substring(
0,
- Math.ceil((maxFilenameLength - 3) / 2),
- )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}`
+ Math.ceil((effectiveMaxLength - 3) / 2),
+ )}...${name.slice(-Math.floor((effectiveMaxLength - 3) / 2))}`
: name;Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In apps/roam/src/utils/exportUtils.ts around lines 114 to 119, the truncation
logic can produce incorrect slices when maxFilenameLength <= 3; add a guard that
handles very small maxFilenameLength values first (e.g., if maxFilenameLength <=
3) and return a string of length maxFilenameLength composed of dots or a
truncated ellipsis, otherwise compute left/right slice sizes using Math.max(0,
Math.floor/ceil((maxFilenameLength - 3) / 2)) to avoid negative values and then
perform the substring + '...' + slice as before.
| blockRefsAsLinks = false, | ||
| blockAnchors = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused parameters: blockRefsAsLinks and blockAnchors.
These parameters are defined in the function signature (lines 225-226, 238-239) but never used in the function body. They may be placeholders for future functionality or remnants of removed code.
🔎 Options
- Remove the unused parameters if they're no longer needed:
- blockRefsAsLinks = false,
- blockAnchors = false,- Or add a TODO comment if they're planned for future implementation:
// TODO: Implement blockRefsAsLinks and blockAnchors functionalityAlso applies to: 238-239
🤖 Prompt for AI Agents
In apps/roam/src/utils/pageToMarkdown.ts around lines 225-226 and 238-239, the
parameters blockRefsAsLinks and blockAnchors are declared but never used; remove
these unused parameters from the function signature (and any related
overloads/call sites) if they are not needed, or if they are reserved for future
work add a single-line TODO comment next to each parameter in the signature and
keep them used as named options (or wire them into the implementation) so
linters/tests stop flagging them — pick one approach and apply it consistently
across all function definitions and callers.
mdroidian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets address the coderabbit suggestions (or resolve if unnecessary ), then re-tag for review.
https://linear.app/discourse-graphs/issue/ENG-1211/refactor-exportutils-factor-out-utility-functions-and-markdown
Summary by CodeRabbit
Release Notes
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.