Conversation
…g newlines (BLO-1065) - Added a targeted DOM preprocessing step before ProseMirror parsing. - Explictly replicates CSS white-space: normal behavior internally to fix MS Word line breaks. - Retains preserveWhitespace: true in PM to satisfy AI diffing constraints from PR #2230. - Skips Notion HTML to preserve intentional hard breaks.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Background Context: Why DOM Preprocessing?This PR resolves a conflict between two parsing requirements:
To support requirement 2, PR #2230 set ProseMirror's Why not change
Conclusion: |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR introduces HTML whitespace preprocessing functionality to normalize non-Notion HTML before parsing. A new utility module detects Notion-specific HTML markers and collapses whitespace in text nodes while preserving content within code elements. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
@blocknote/ariakit
@blocknote/code-block
@blocknote/core
@blocknote/mantine
@blocknote/react
@blocknote/server-util
@blocknote/shadcn
@blocknote/xl-ai
@blocknote/xl-docx-exporter
@blocknote/xl-email-exporter
@blocknote/xl-multi-column
@blocknote/xl-odt-exporter
@blocknote/xl-pdf-exporter
commit: |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
wordcopy.txt (1)
11-21: Local file paths in test fixture.This file contains local filesystem paths (
file:////Users/yousef/Library/...) which are not functionally relevant and could be removed or anonymized for cleaner test data. These paths won't affect the parsing since they're in<link>elements that don't impact the text content extraction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wordcopy.txt` around lines 11 - 21, The test fixture contains literal local file URIs in the <link> elements (e.g., the file:////Users/yousef/... values in the href attributes) which should be removed or anonymized; edit wordcopy.txt to either delete those <link rel=File-List>, <link rel=themeData> and <link rel=colorSchemeMapping> lines or replace their href values with neutral/dummy paths (e.g., relative or placeholder URIs) so the fixture no longer contains host-specific local filesystem paths while preserving the structure for parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@wordcopy.txt`:
- Around line 11-21: The test fixture contains literal local file URIs in the
<link> elements (e.g., the file:////Users/yousef/... values in the href
attributes) which should be removed or anonymized; edit wordcopy.txt to either
delete those <link rel=File-List>, <link rel=themeData> and <link
rel=colorSchemeMapping> lines or replace their href values with neutral/dummy
paths (e.g., relative or placeholder URIs) so the fixture no longer contains
host-specific local filesystem paths while preserving the structure for parsing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e26020a-caff-4045-ae6e-4a1ae93907d8
📒 Files selected for processing (7)
packages/core/src/api/parsers/html/parseHTML.tspackages/core/src/api/parsers/html/util/normalizeWhitespace.tstests/src/unit/core/clipboard/paste/pasteTestInstances.tstests/src/unit/core/formatConversion/parse/__snapshots__/html/mixedTextTableCell.jsontests/src/unit/core/formatConversion/parse/__snapshots__/html/msWordPaste.jsontests/src/unit/core/formatConversion/parse/parseTestInstances.tswordcopy.txt
…rd-is-broken-blo-1065
| S extends StyleSchema, | ||
| >(html: string, pmSchema: Schema): Block<BSchema, I, S>[] { | ||
| const htmlNode = nestedListsToBlockNoteStructure(html); | ||
| preprocessHTMLWhitespace(htmlNode); |
There was a problem hiding this comment.
Why here & not as part of nestesListToBlockNoteStructure?
nperez0111
left a comment
There was a problem hiding this comment.
We are slowly creeping more & more into preprocess & postprocess steps.
This is fine for now, I hate the need for content detection, but if we have conflicting structures to support then so be it
Summary
Fixes the issue where copying and pasting from MS Word (or other sources with HTML source formatting) introduced extra, unintended line breaks. (closes #2548, closes #2356)
Rationale
When HTML is pasted, source code line breaks (`\n`) within elements like `
` were being converted into hard breaks in BlockNote. While standard HTML collapses these into spaces, PR #2230 introduced `preserveWhitespace: true` to prevent the stripping of leading and trailing spaces during AI diffing. This global flag disabled all whitespace collapsing, causing every formatting newline to appear as a visible hard break. This PR introduces a targeted preprocessing step to resolve both needs simultaneously.
Changes
Impact
Fixes pasting from MS Word and preserves the behavior required by PR #2230 for AI diffing. No negative impacts expected; Notion pasting behavior is fully preserved.
Testing
Screenshots/Video
N/A
Checklist
Additional Notes
See PR comment for more technical details on why native ProseMirror `preserveWhitespace` settings were insufficient.
Summary by CodeRabbit
Bug Fixes
Tests