Skip to content

Conversation

@kolkov
Copy link
Contributor

@kolkov kolkov commented Jan 15, 2026

Summary

  • New 18th strategy UseMultilineReverseSuffix for multiline suffix patterns like (?m)^/.*\.php
  • 3.5-5.7x faster than stdlib on large inputs (was 24% slower)
  • 12-130x faster on no-match cases due to efficient suffix prefilter rejection

Benchmark Results (0.5MB log file, 10,000 lines)

Operation coregex stdlib Speedup
IsMatch 20.6 µs 72.2 µs 3.5x
Find 15.3 µs 68.7 µs 4.5x
CountAll (200 matches) 2.56 ms 14.6 ms 5.7x

Algorithm

  1. Suffix prefilter finds .php candidates
  2. Backward scan to line start (\n or pos 0)
  3. Forward PikeVM verification from line start

Files Changed

  • meta/reverse_suffix_multiline.go (NEW) - 234 lines
  • meta/reverse_suffix_multiline_test.go (NEW) - tests
  • meta/strategy.go - new strategy constant
  • meta/compile.go, meta/engine.go, meta/find.go, meta/ismatch.go - integration
  • CHANGELOG.md, README.md, ROADMAP.md - documentation

Test Plan

  • All existing tests pass
  • New multiline tests pass
  • Correctness verified against stdlib
  • Benchmarks show speedup on large inputs
  • Linter: 0 issues

Fixes #97

New 18th strategy for multiline suffix patterns like (?m)^/.*\.php:
- Algorithm: suffix prefilter + backward line scan + forward PikeVM
- Large inputs (0.5MB): 3.5-5.7x faster than stdlib
- No-match cases: 12-130x faster due to efficient rejection
- Files: meta/reverse_suffix_multiline.go (234 lines)

Fixes #97
@github-actions
Copy link

Benchmark Comparison

Comparing main → PR #98

Summary: geomean 253.8n 252.3n -0.58%

⚠️ Potential regressions detected:

geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
MatchAnchoredLiteral/short_match-4                      6.867n ± ∞ ¹    7.191n ± ∞ ¹    +4.72% (p=0.008 n=5)
MatchAnchoredLiteral/no_match_suffix-4                  4.096n ± ∞ ¹    4.363n ± ∞ ¹    +6.52% (p=0.008 n=5)
AnchoredLiteralVsStdlib/coregex_short-4                 8.720n ± ∞ ¹    9.094n ± ∞ ¹    +4.29% (p=0.008 n=5)
ASCIIOptimization_Issue79/medium_WithoutASCII-4         1.713µ ± ∞ ¹    1.784µ ± ∞ ¹    +4.14% (p=0.008 n=5)
BranchDispatch_Coregex/Digits-4                         9.038n ± ∞ ¹    9.655n ± ∞ ¹    +6.83% (p=0.008 n=5)
BranchDispatch_Coregex/UUID-4                           7.057n ± ∞ ¹    7.490n ± ∞ ¹    +6.14% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

@kolkov kolkov merged commit 6552443 into main Jan 15, 2026
15 checks passed
@kolkov kolkov deleted the feature/multiline-reverse-suffix branch January 15, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: (?m)^/.*[\w-]+\.php multiline+wildcard 24% slower than stdlib

2 participants