ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647

maparent · 2025-12-25T17:16:46Z

https://linear.app/discourse-graphs/issue/ENG-1211/refactor-exportutils-factor-out-utility-functions-and-markdown

Summary by CodeRabbit

Release Notes

New Features
- Enhanced Markdown export with frontmatter support for improved document metadata
- Improved handling of embedded content, references, and links in exports
- Better content extraction and file naming for exported pages
- More flexible export formatting and link options
Refactor
- Consolidated export utilities for improved code maintainability

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…own. This is pure refactoring with no change of functionality.

linear · 2025-12-25T17:16:49Z

ENG-1211 Refactor exportUtils, factor out utility functions and markdown

supabase · 2025-12-25T17:16:55Z

This pull request has been ignored for the connected project zytfjzqyijgagqxrzbmz because there are no changes detected in packages/database/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.

Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

maparent · 2025-12-25T18:39:29Z

Explanations about the breakup here: https://www.loom.com/share/afa48744aa0441628ec182a2147e8321
Also note: The coderabbit work was done in 1032 , this is just splitting off the code changes.

mdroidian · 2025-12-25T21:16:50Z

@coderabbitai full review

coderabbitai · 2025-12-25T21:16:55Z

✅ Actions performed

Full review triggered.

coderabbitai · 2025-12-25T21:22:33Z

📝 Walkthrough

Walkthrough

This PR introduces new utility modules for exporting Roam content to various formats. It creates exportUtils.ts with helpers for page data extraction, filename generation, and block conversion, and pageToMarkdown.ts for rendering Roam blocks to Markdown with discourse context and frontmatter support. The existing getExportTypes.ts is refactored to delegate to these utilities, eliminating local implementations.

Changes

Cohort / File(s)	Change Summary
New Export Utilities `apps/roam/src/utils/exportUtils.ts`, `apps/roam/src/utils/pageToMarkdown.ts`	Added two new utility modules: `exportUtils.ts` exports 7 functions for array deduplication, page data extraction, content extraction from discourse nodes, filename generation with sanitization, link format conversion, Roam block-to-tree conversion, and recursive UID collection; `pageToMarkdown.ts` exports markdown rendering with support for frontmatter generation, discourse context fetching, embed/reference handling, view type mapping, and cross-page reference evaluation
Refactored Export Logic `apps/roam/src/utils/getExportTypes.ts`	Refactored to import and use new utilities; removed local implementations for block pulling, filename generation, markdown rendering, and frontmatter handling; added early return guards for empty results in export format callbacks; removed exported `DiscourseExportResult` type; now centralizes page data extraction via `getPageData()` helper

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

#265: Modifies markdown export embedding logic (EMBED\_REGEX, EMBED\_CHILDREN\_REGEX, and embed handling in toMarkdown).

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: refactoring exportUtils and factoring out utility functions and markdown into separate modules.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

apps/roam/src/utils/getExportTypes.ts (1)

224-241: TODO comment indicates future consolidation opportunity.

The comment at line 224 notes that treeNodeToMarkdown logic should be reused with the markdown export. Since pageToMarkdown.ts now exists with a toMarkdown function, this could potentially be addressed in a follow-up PR to reduce duplication between the PDF and Markdown exports.

Would you like me to open an issue to track consolidating the PDF export's treeNodeToMarkdown with the existing toMarkdown from pageToMarkdown.ts?

apps/roam/src/utils/exportUtils.ts (1)

8-17: Consider performance implications of uniqJsonArray.

The implementation uses JSON.stringify with sorted entries for deduplication, which works correctly but has O(n * m log m) complexity where m is the number of keys per object. For large arrays with complex objects, this could be slow.

For the current use case (filtering relation data), this should be acceptable, but worth noting if used with larger datasets.
apps/roam/src/utils/pageToMarkdown.ts (1)
96-110: YAML frontmatter escaping is limited to the text field.

Line 101-102 correctly escapes quotes in the text field, but other dynamic fields (line 104) are inserted without escaping. Values containing YAML special characters (:, \n, #, etc.) could produce malformed YAML.

Since this is for user-controlled exports, it's not a security issue, but could cause parsing errors when importing the exported markdown.
🔎 Optional: Comprehensive YAML escaping
       if (capt === "text") {
-        // Wrap title in quotes and escape additional quotes
-        const escapedText = result[capt].toString().replace(/"/g, '\\"');
-        return `"${escapedText}"`;
+        const escapedText = result[capt].toString().replace(/"/g, '\\"');
+        return `"${escapedText}"`;
       }
-      return result[capt].toString();
+      const value = result[capt].toString();
+      // Quote values containing YAML special characters
+      if (/[:\n#[\]{}|>&*!?]/.test(value) || value.includes('"')) {
+        return `"${value.replace(/"/g, '\\"')}"`;
+      }
+      return value;

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dfa920 and 4d811e4.

📒 Files selected for processing (3)

apps/roam/src/utils/exportUtils.ts
apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/main.mdc)

**/*.{ts,tsx}: Use Tailwind CSS for styling where possible
When refactoring inline styles, use tailwind classes
Prefer type over interface in TypeScript
Use explicit return types for functions
Avoid any types when possible
Prefer arrow functions over regular function declarations
Use named parameters (object destructuring) when a function has more than 2 parameters
Use PascalCase for components and types
Use camelCase for variables and functions
Use UPPERCASE for constants
Function names should describe their purpose clearly
Prefer early returns over nested conditionals for better readability

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**/*.{js,ts,tsx,jsx,json}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Prefer existing dependencies from package.json when working on the Roam Research extension

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**/*.{ts,tsx,jsx,js,css,scss}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

apps/roam/**/*.{ts,tsx,js,jsx}: Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality
Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

apps/roam/**

📄 CodeRabbit inference engine (.cursor/rules/roam.mdc)

Implement the Discourse Graph protocol in the Roam Research extension

Files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

🧠 Learnings (10)

📚 Learning: 2025-06-22T10:40:52.752Z

Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-07-13T16:47:14.352Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-07-13T16:47:14.352Z
Learning: In the discourse-graph codebase, types.gen.ts contains automatically generated database function type definitions that may have reordered signatures between versions. This reordering is expected behavior from the code generation process and should not be flagged as an issue requiring fixes.

Applied to files:

apps/roam/src/utils/getExportTypes.ts

📚 Learning: 2025-07-21T14:22:20.752Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 291
File: packages/database/supabase/functions/create-space/index.ts:0-0
Timestamp: 2025-07-21T14:22:20.752Z
Learning: In the discourse-graph codebase, types.gen.ts is not accessible from Supabase edge functions, requiring duplication of types and utilities when needed in the edge function environment at packages/database/supabase/functions/.

Applied to files:

apps/roam/src/utils/getExportTypes.ts

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts
apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-11-25T00:52:41.934Z

Learnt from: CR
Repo: DiscourseGraphs/discourse-graph PR: 0
File: .cursor/rules/roam.mdc:0-0
Timestamp: 2025-11-25T00:52:41.934Z
Learning: Applies to apps/roam/**/*.{ts,tsx,jsx,js,css,scss} : Use BlueprintJS 3 components and Tailwind CSS for platform-native UI in the Roam Research extension

Applied to files:

apps/roam/src/utils/getExportTypes.ts

📚 Learning: 2025-11-05T21:57:14.909Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.

Applied to files:

apps/roam/src/utils/getExportTypes.ts
apps/roam/src/utils/pageToMarkdown.ts

📚 Learning: 2025-12-07T20:54:20.007Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 559
File: apps/roam/src/utils/findDiscourseNode.ts:37-39
Timestamp: 2025-12-07T20:54:20.007Z
Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-17T23:37:45.289Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Applied to files:

apps/roam/src/utils/exportUtils.ts

📚 Learning: 2025-06-17T23:37:45.289Z

Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.

Applied to files:

apps/roam/src/utils/exportUtils.ts

🧬 Code graph analysis (2)

apps/roam/src/utils/getExportTypes.ts (4)

apps/roam/src/utils/types.ts (1)

Result (42-46)

apps/roam/src/utils/exportUtils.ts (1)

getPageData (19-67)

apps/roam/src/utils/getExportSettings.ts (1)

getExportSettings (112-125)

apps/roam/src/utils/pageToMarkdown.ts (1)

pageToMarkdown (212-351)

apps/roam/src/utils/exportUtils.ts (1)

apps/roam/src/utils/types.ts (1)

Result (42-46)

🪛 ast-grep (0.40.3)

apps/roam/src/utils/exportUtils.ts

[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (10)

apps/roam/src/utils/getExportTypes.ts (3)

14-22: Clean modular imports from new utility files.

The refactoring correctly imports the required utilities from the new modules. This aligns well with the PR's goal of factoring out utility functions.

84-132: Markdown export callback cleanly refactored.

The callback properly delegates to pageToMarkdown while preserving the progress update pattern and sequential processing. The early guard at line 85 improves robustness.

55-79: Incorrect claim about Neo4j export parameter.

The review incorrectly states that Neo4j (line 152) includes isExportDiscourseGraph, but it doesn't. The actual pattern shows: JSON and Neo4j both omit the parameter (filtering results to discourse nodes), while Markdown and PDF include it (exporting all content). This appears to be intentional design: structured data exports filter by node type, while document exports preserve all content.

Likely an incorrect or invalid review comment.

apps/roam/src/utils/exportUtils.ts (2)

122-129: Good link type handling with appropriate fallback.

The toLink function correctly handles the three link types (wikilinks, alias, roam url) with a sensible plain filename fallback. The extension removal for wikilinks and roam URLs matches expected Markdown/wiki conventions.

131-153: Well-structured recursive tree conversion utilities.

pullBlockToTreeNode correctly maps Roam's PullBlock structure to the TreeNode interface with appropriate defaults. collectUids provides a clean recursive UID collection. Both functions are idiomatic and readable.

apps/roam/src/utils/pageToMarkdown.ts (5)

24-34: Clear regex patterns for Roam embed syntax.

The MATCHES_NONE pattern (/$.+^/) is a clever approach to create a regex that never matches, avoiding conditional logic. The embed regexes correctly capture the 9-10 character block UIDs. Good documentation comments explain the syntax being matched.

42-75: Clean conditional discourse context handling.

The function uses early returns appropriately and cleanly handles the optional appendRefNodeContext feature. The logic for finding and appending referenced discourse nodes is well-structured.

142-161: Comprehensive text processing with proper ordering.

The text processing chain correctly handles:

Embeds and embed-children (with recursion)

Block references

TODO/DONE checkbox conversion

Roam italic syntax conversion

Code block formatting

The use of MATCHES_NONE to conditionally skip regex replacements is an elegant pattern.

280-348: Well-structured content assembly with clear section formatting.

The template literal construction correctly builds the markdown document with:

Frontmatter

Main content from tree children

Optional Discourse Context section

Optional References section

The conditional sections use appropriate guards and formatting.

253-264: Verify the escaping behavior of normalizePageTitle from the roamjs-components library.

The concern about query injection via string interpolation at line 256 is theoretically valid—if normalizePageTitle doesn't escape quotes in page titles, a title like foo"bar would break the Datalog syntax. However, the implementation of normalizePageTitle from the external roamjs-components (v0.85.7) library cannot be verified from public sources. The pattern of string interpolation is consistent throughout the codebase for various user inputs, and no documented security issues exist for this pattern in Roam API calls. Clarify whether normalizePageTitle properly escapes special characters (particularly quotes) for use in Datalog string contexts, or consider using Datalog's parameter binding (:in $ clause) as a safer alternative if available.

coderabbitai · 2025-12-25T21:22:36Z

apps/roam/src/utils/exportUtils.ts

+  const allResults = results || [];
+
+  if (isExportDiscourseGraph) return allResults as DiscourseExportResult[];
+
+  const matchedTexts = new Set();
+  return allNodes.flatMap((n) =>
+    (allResults
+      ? allResults.flatMap((r) =>
+          Object.keys(r)
+            .filter((k) => k.endsWith(`-uid`) && k !== "text-uid")
+            .map((k) => ({
+              ...r,
+              text: r[k.slice(0, -4)].toString(),
+              uid: r[k] as string,
+            }))
+            .concat({
+              text: r.text,
+              uid: r.uid,
+            }),
+        )
+      : (
+          window.roamAlphaAPI.q(
+            "[:find (pull ?e [:block/uid :node/title]) :where [?e :node/title _]]",
+          ) as [Record<string, string>][]
+        ).map(([{ title, uid }]) => ({
+          text: title,
+          uid,
+        }))


⚠️ Potential issue | 🟡 Minor

Redundant truthiness check on allResults.

Line 28 assigns const allResults = results || [], ensuring allResults is always an array. The ternary at line 34 (allResults ? ... : ...) will therefore always take the truthy branch, making the fallback Roam query (lines 48-55) unreachable dead code.

🔎 Suggested fix

Either remove the dead code branch, or if the fallback is intentional for when results is explicitly undefined, change line 28 to preserve that distinction:

- const allResults = results || []; + const allResults = results; - if (isExportDiscourseGraph) return allResults as DiscourseExportResult[]; + if (isExportDiscourseGraph) return (allResults || []) as DiscourseExportResult[];

Or simply remove lines 48-55 if they're no longer needed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const allResults = results || [];

if (isExportDiscourseGraph) return allResults as DiscourseExportResult[];

const matchedTexts = new Set();

return allNodes.flatMap((n) =>

(allResults

? allResults.flatMap((r) =>

Object.keys(r)

.filter((k) => k.endsWith(`-uid`) && k !== "text-uid")

.map((k) => ({

...r,

text: r[k.slice(0, -4)].toString(),

uid: r[k] as string,

}))

.concat({

text: r.text,

uid: r.uid,

}),

)

: (

window.roamAlphaAPI.q(

"[:find (pull ?e [:block/uid :node/title]) :where [?e :node/title _]]",

) as [Record<string, string>][]

).map(([{ title, uid }]) => ({

text: title,

uid,

}))

const allResults = results;

if (isExportDiscourseGraph) return (allResults || []) as DiscourseExportResult[];

const matchedTexts = new Set();

return allNodes.flatMap((n) =>

(allResults

? allResults.flatMap((r) =>

Object.keys(r)

.filter((k) => k.endsWith(`-uid`) && k !== "text-uid")

.map((k) => ({

...r,

text: r[k.slice(0, -4)].toString(),

uid: r[k] as string,

}))

.concat({

text: r.text,

uid: r.uid,

}),

)

: (

window.roamAlphaAPI.q(

"[:find (pull ?e [:block/uid :node/title]) :where [?e :node/title _]]",

) as [Record<string, string>][]

).map(([{ title, uid }]) => ({

text: title,

uid,

}))

🤖 Prompt for AI Agents

In apps/roam/src/utils/exportUtils.ts around lines 28 to 55, the code does `const allResults = results || []` then uses a ternary `allResults ? ... : ...`, making the false branch unreachable; either remove the unreachable Roam query fallback branch (lines ~48-55) or preserve the distinction between undefined and empty array by changing the assignment to keep undefined when results is undefined (e.g., `const allResults = results;`), then update the ternary accordingly so the fallback executes only when results is truly undefined.

coderabbitai · 2025-12-25T21:22:36Z

apps/roam/src/utils/exportUtils.ts

+  if (!nodeFormat) return title;
+  const regex = new RegExp(
+    `^${nodeFormat
+      .replace(/\[/g, "\\[")
+      .replace(/]/g, "\\]")
+      .replace("{content}", "(.*?)")
+      .replace(/{[^}]+}/g, "(?:.*?)")}$`,
+  );
+  return regex.exec(title)?.[1] || title;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find DiscourseNode type definition and format field rg -n "type DiscourseNode|interface DiscourseNode" -A 20

Repository: DiscourseGraphs/discourse-graph

Length of output: 22213

🏁 Script executed:

# Check how format patterns are created/validated in the codebase rg -n "\.format\s*=" apps/roam/src -B 2 -A 2

Repository: DiscourseGraphs/discourse-graph

Length of output: 57

🏁 Script executed:

# Look at the full context of exportUtils.ts around lines 79-87 cat -n apps/roam/src/utils/exportUtils.ts | head -100 | tail -50

Repository: DiscourseGraphs/discourse-graph

Length of output: 1584

🏁 Script executed:

# Check if there's any validation of format field when it's set/saved rg -n "format" apps/roam/src -l | head -20

Repository: DiscourseGraphs/discourse-graph

Length of output: 933

🏁 Script executed:

# Search for format field validation or sanitization rg -n "format.*validate|validateFormat|format.*check" apps/roam/src -B 2 -A 2

Repository: DiscourseGraphs/discourse-graph

Length of output: 1338

🏁 Script executed:

# Check DiscourseNodeConfigPanel to see format field editing cat -n apps/roam/src/components/settings/DiscourseNodeConfigPanel.tsx | head -200

Repository: DiscourseGraphs/discourse-graph

Length of output: 8237

🏁 Script executed:

# Search for other uses of format field and regex construction rg -n "getContentFromNodes|format.*RegExp|new RegExp" apps/roam/src/utils

Repository: DiscourseGraphs/discourse-graph

Length of output: 1790

🏁 Script executed:

# Check if there's any try-catch or error handling for regex in similar code rg -n "try.*RegExp|RegExp.*catch" apps/roam/src -B 2 -A 2

Repository: DiscourseGraphs/discourse-graph

Length of output: 57

🏁 Script executed:

# Find the validate function in NodeConfig rg -n "const validate|function validate" apps/roam/src/components/settings/NodeConfig.tsx -B 5 -A 10

Repository: DiscourseGraphs/discourse-graph

Length of output: 506

🏁 Script executed:

# Check getDiscourseNodeFormatExpression since it also processes format cat -n apps/roam/src/utils/getDiscourseNodeFormatExpression.ts

Repository: DiscourseGraphs/discourse-graph

Length of output: 544

🏁 Script executed:

# Look for format validation logic more broadly rg -n "validate.*format|format.*valid" apps/roam/src -B 2 -A 5

Repository: DiscourseGraphs/discourse-graph

Length of output: 2231

🏁 Script executed:

# Check if format field has any UI constraints or warnings cat -n apps/roam/src/components/settings/NodeConfig.tsx | grep -A 30 -B 5 "format"

Repository: DiscourseGraphs/discourse-graph

Length of output: 7712

Use the proper format escaping function to prevent ReDoS.

The getContentFromNodes function at lines 79-87 constructs a regex from the user-editable nodeFormat field with incomplete escaping. Unlike getDiscourseNodeFormatExpression which properly escapes regex metacharacters ([, ], ?, ., +), this code only escapes brackets, leaving other quantifiers unescaped. A malformed format like (a+)+b would create a regex with nested quantifiers, causing exponential backtracking.

Rather than adding a try-catch, replace the custom escaping with getDiscourseNodeFormatExpression, which already handles this safely.

🧰 Tools

🪛 ast-grep (0.40.3)

[warning] 79-85: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${nodeFormat .replace(/\[/g, "\\[") .replace(/]/g, "\\]") .replace("{content}", "(.*?)") .replace(/{[^}]+}/g, "(?:.*?)")}$,
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents

In apps/roam/src/utils/exportUtils.ts around lines 79-87, the code builds a RegExp from user-editable nodeFormat but only escapes brackets, leaving other metacharacters and causing ReDoS risk; replace this custom escaping and regex construction with the existing getDiscourseNodeFormatExpression utility that properly escapes all regex metacharacters, then use the returned expression to exec against title (preserve the same fallback to title when no match). Ensure you import getDiscourseNodeFormatExpression if not already imported and remove the manual replace chain so the format is safely escaped before creating/using the regex.

Ok, methodology question: On the one hand, we are saying we should have only one logical change per PR. I tried to make this PR do only the refactor, nothing else, so that you could see it as innocuous and focus on the substance in the other PRs.
On the other hand, moving code draws the attention to issues with previously existing code. Coderabbit pointed this out in its comments to 1032, and I made (a subset of) this fix in that PR originally (a subset because I was following pattern) and then moved it to 1213 instead of 1211 so that 1211 would not have unrelated changes.
(Accessorily, maybe we should do the full fix in both places.)
So the question: When we do a refactoring, and try to make it into a single PR for clarity, is fixing issues with old code part of the same task or an extra task? Honestly, I think it depends; I have argued earlier that it's too much trouble to revert anything done automatically by the tooling. But it's also true that it muddies the refactoring.
I don't care that much either way, but I'm curious about which rule I'm expected to follow here.

@maparent, this is a great question that highlights a real tension in software engineering practice. Here's my perspective on the tradeoffs:

Arguments for keeping refactoring PRs pure:

Easier to review and understand the structural changes

Clearer git history with distinct "what" and "why"

Can be reverted independently if needed

Reduces cognitive load for reviewers

Arguments for fixing issues during refactoring:

Avoids touching the same code twice

Issues are fresh in mind when you're already working with the code

More efficient overall (fewer PRs, fewer context switches)

Prevents shipping known issues in the interim

Practical heuristics I'd suggest:

Security or critical bugs: Fix immediately in the same PR. ReDoS vulnerabilities like the one I flagged fall into this category - if you're moving the code anyway, fixing it is the responsible choice.

Minor improvements or style issues: Keep separate unless trivial (e.g., automated formatting).

Structural issues revealed by the refactor: This is where judgment matters. If the refactoring exposes that the code fundamentally doesn't work right, fixing it in the same PR makes sense. If it's just "this could be better," separate PR.

Team conventions: Your retrieved learning shows you "take a focused approach to PR scope," which is a good default. However, pragmatism should win when the overhead of separation outweighs the benefits.

For this specific case: The ReDoS issue is a security concern that exists in code you're moving. I'd recommend fixing it in PR 1213 (or this PR if you consolidate) rather than leaving it unfixed. You could also note in the PR description "refactor + fix ReDoS in getContentFromNodes" so reviewers know to expect both.

What does your team prefer?

🧠 Learnings used

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 232 File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31 Timestamp: 2025-06-22T10:40:52.752Z Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 559 File: apps/roam/src/utils/findDiscourseNode.ts:37-39 Timestamp: 2025-12-07T20:54:20.007Z Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 220 File: apps/roam/src/utils/conceptConversion.ts:42-56 Timestamp: 2025-06-17T23:37:45.289Z Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 372 File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116 Timestamp: 2025-08-25T15:53:21.799Z Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 165 File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45 Timestamp: 2025-05-20T03:11:07.917Z Learning: When reviewing SQL migrations in the discourse-graph codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 0 File: :0-0 Timestamp: 2025-08-30T17:03:26.927Z Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 165 File: packages/database/supabase/migrations/20250504202930_content_tables.sql:37-45 Timestamp: 2025-05-20T03:11:07.917Z Learning: When reviewing SQL migrations in this codebase, understand that historical migration files should not be modified once they're applied. Instead, issues should be fixed in subsequent migrations or schema definition files. This is why maparent indicated "migration, not changing. corrected in schema."

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Sorry I forgot to tag @mdroidian that question was addressed to him.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

But @CodeRabbit good point about the security implications!

@maparent, you're welcome! Security issues like ReDoS can be subtle but have real impact, especially in user-facing tools where format strings are configurable. Since you're already touching this code in the refactor, addressing it now makes sense.

If you do decide to fix it in this PR or PR 1213, using getDiscourseNodeFormatExpression (which already has the proper escaping logic) should make the fix straightforward. Let me know if you'd like me to review the fix once it's in place!

🧠 Learnings used

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 232 File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31 Timestamp: 2025-06-22T10:40:52.752Z Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 559 File: apps/roam/src/utils/findDiscourseNode.ts:37-39 Timestamp: 2025-12-07T20:54:20.007Z Learning: In apps/roam/src/utils/findDiscourseNode.ts, the function findDiscourseNodeByTitleAndUid accepts both uid and title parameters where uid is primarily used for cache access (as the cache key) while title is used for the actual matching via matchDiscourseNode. This design reflects the pattern where downstream, the uid is mostly used to fetch the title, so the function caches by uid but matches by title.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 220 File: apps/roam/src/utils/conceptConversion.ts:42-56 Timestamp: 2025-06-17T23:37:45.289Z Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.

Learnt from: sid597 Repo: DiscourseGraphs/discourse-graph PR: 372 File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116 Timestamp: 2025-08-25T15:53:21.799Z Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.

Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 0 File: :0-0 Timestamp: 2025-08-30T17:03:26.927Z Learning: For maparent: Takes a focused approach to PR scope - acknowledges good suggestions that are out of scope for current PR rather than expanding scope.

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use Roam Depot/Extension API docs from https://roamresearch.com/#/app/developer-documentation/page/y31lhjIqU when implementing extension functionality

Learnt from: CR Repo: DiscourseGraphs/discourse-graph PR: 0 File: .cursor/rules/roam.mdc:0-0 Timestamp: 2025-11-25T00:52:41.934Z Learning: Applies to apps/roam/**/*.{ts,tsx,js,jsx} : Use the roamAlphaApi docs from https://roamresearch.com/#/app/developer-documentation/page/tIaOPdXCj when implementing Roam functionality

Sorry I forgot to tag @mdroidian that question was addressed to him.

This is probably best decided on a case by case basis depending on severity, effort, number of changes, etc.

Use your best judgement and just be sure to document the rationale with an inline GitHub comment.

coderabbitai · 2025-12-25T21:22:37Z

apps/roam/src/utils/exportUtils.ts

+  return name.length > maxFilenameLength
+    ? `${name.substring(
+        0,
+        Math.ceil((maxFilenameLength - 3) / 2),
+      )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}`
+    : name;


⚠️ Potential issue | 🟡 Minor

Edge case: Very small maxFilenameLength values.

If maxFilenameLength is less than or equal to 3, the calculation (maxFilenameLength - 3) / 2 could result in 0 or negative values, potentially producing unexpected output. Consider adding a guard or minimum value check.

🔎 Suggested guard

+ const effectiveMaxLength = Math.max(maxFilenameLength, 4); return name.length > maxFilenameLength ? `${name.substring( 0, - Math.ceil((maxFilenameLength - 3) / 2), - )}...${name.slice(-Math.floor((maxFilenameLength - 3) / 2))}` + Math.ceil((effectiveMaxLength - 3) / 2), + )}...${name.slice(-Math.floor((effectiveMaxLength - 3) / 2))}` : name;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In apps/roam/src/utils/exportUtils.ts around lines 114 to 119, the truncation logic can produce incorrect slices when maxFilenameLength <= 3; add a guard that handles very small maxFilenameLength values first (e.g., if maxFilenameLength <= 3) and return a string of length maxFilenameLength composed of dots or a truncated ellipsis, otherwise compute left/right slice sizes using Math.max(0, Math.floor/ceil((maxFilenameLength - 3) / 2)) to avoid negative values and then perform the substring + '...' + slice as before.

coderabbitai · 2025-12-25T21:22:37Z

apps/roam/src/utils/pageToMarkdown.ts

+    blockRefsAsLinks = false,
+    blockAnchors = false,


⚠️ Potential issue | 🟡 Minor

Unused parameters: blockRefsAsLinks and blockAnchors.

These parameters are defined in the function signature (lines 225-226, 238-239) but never used in the function body. They may be placeholders for future functionality or remnants of removed code.

🔎 Options

Remove the unused parameters if they're no longer needed:

- blockRefsAsLinks = false, - blockAnchors = false,

Or add a TODO comment if they're planned for future implementation:

// TODO: Implement blockRefsAsLinks and blockAnchors functionality

Also applies to: 238-239

🤖 Prompt for AI Agents

In apps/roam/src/utils/pageToMarkdown.ts around lines 225-226 and 238-239, the parameters blockRefsAsLinks and blockAnchors are declared but never used; remove these unused parameters from the function signature (and any related overloads/call sites) if they are not needed, or if they are reserved for future work add a single-line TODO comment next to each parameter in the signature and keep them used as named options (or wire them into the implementation) so linters/tests stop flagging them — pick one approach and apply it consistently across all function definitions and callers.

mdroidian

Lets address the coderabbit suggestions (or resolve if unnecessary ), then re-tag for review.

eng-1211 Refactor exportUtils, factor out utility functions and markd…

5970ac4

…own. This is pure refactoring with no change of functionality.

unnecessary await

4d811e4

maparent requested a review from mdroidian December 25, 2025 18:38

maparent mentioned this pull request Dec 25, 2025

ENG-1032 Minimal JSON-LD export of the discourse-graph #529

Draft

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

mdroidian requested changes Dec 25, 2025

View reviewed changes

ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647

Are you sure you want to change the base?

ENG-1211 Refactor exportUtils, factor out utility functions and markdown #647

Conversation

maparent commented Dec 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

linear bot commented Dec 25, 2025

Uh oh!

supabase bot commented Dec 25, 2025

Uh oh!

maparent commented Dec 25, 2025

Uh oh!

mdroidian commented Dec 25, 2025

Uh oh!

coderabbitai bot commented Dec 25, 2025

Uh oh!

coderabbitai bot commented Dec 25, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maparent Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

maparent Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

maparent Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

mdroidian Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

mdroidian left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maparent commented Dec 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot Dec 25, 2025 •

edited

Loading