Skip to content

Fix: Add leading and trailing pipes to markdown tables for GFM compliance#1752

Open
PatD42 wants to merge 1 commit intounclecode:mainfrom
PatD42:fix/table-gfm-pipes
Open

Fix: Add leading and trailing pipes to markdown tables for GFM compliance#1752
PatD42 wants to merge 1 commit intounclecode:mainfrom
PatD42:fix/table-gfm-pipes

Conversation

@PatD42
Copy link

@PatD42 PatD42 commented Feb 8, 2026

Summary

Fixes #1731 - Adds leading and trailing pipe delimiters to markdown tables for GitHub Flavored Markdown (GFM) compliance.

List of files changed and why

  • crawl4ai/html2text/__init__.py (lines 717-733) - Fixed table generation logic to always include boundary pipes
  • tests/test_table_gfm_compliance.py - Added 8 comprehensive unit tests to verify GFM compliance

Changes

Before

Parameter | Guideline
---|---
Value1 | Value2

After

| Parameter | Guideline |
| --- | --- |
| Value1 | Value2 |

Technical Changes

  1. Line 719: Removed condition to always output | for every cell (including first)
  2. Line 726: Added self.o(" |") to close rows with trailing pipe
  3. Line 731: Fixed separator format from ---|--- to | --- | --- |

Why This Fix (Not pad_tables=True)

  • pad_tables=False is the default setting
  • pad_tables is designed for column alignment, not structural GFM compliance
  • Table generation should produce correct output by default
  • This fix works with both pad_tables=True and False

How Has This Been Tested?

Unit Tests (8 tests covering all scenarios):

  • test_table_has_leading_pipes - Verifies all rows start with |
  • test_table_has_trailing_pipes - Verifies all rows end with |
  • test_separator_row_has_pipes - Verifies separator format | --- | --- |
  • test_works_with_pad_tables_false - Tests with default setting
  • test_works_with_pad_tables_true - Tests with padding enabled
  • test_multirow_table - Tests 4-row table with 3 columns
  • test_single_column_table - Tests edge case
  • test_empty_cells - Tests tables with empty cells

Manual Testing:

from crawl4ai.html2text import HTML2Text

html = '<table><tr><th>A</th><th>B</th></tr><tr><td>1</td><td>2</td></tr></table>'
h = HTML2Text()
result = h.handle(html)
print(result)
# Output: | A | B |
#         | --- | --- |
#         | 1 | 2 |

All tests verify that:

  • Every table row has leading and trailing pipes
  • Works with both pad_tables settings
  • Handles edge cases (empty cells, single column, multi-row)

Impact

  • ✅ GFM compliant tables
  • ✅ Better parser compatibility (Jekyll, Hugo, CommonMark, showdown)
  • ✅ Improved IDE/preview rendering
  • ✅ No breaking changes (cosmetic improvement only)
  • ✅ Works with both pad_tables settings

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (N/A - internal fix)
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

…ance

- Always output pipe before every cell (including first)
- Add trailing pipe at end of each row
- Fix separator row to include boundary pipes
- Add comprehensive unit tests for GFM compliance

Fixes unclecode#1731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Markdown tables missing outer pipe delimiters - causes parser compatibility issues

1 participant