Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
48e3b4c
Remove PCRE2_CODE_UNIT_WIDTH=16
ltrzesniewski Oct 11, 2025
36e17b6
Add UTF-8 variant
ltrzesniewski Oct 11, 2025
ba6e823
Cleanup vcxproj
ltrzesniewski Oct 11, 2025
54affef
Split pcrenet code
ltrzesniewski Oct 12, 2025
4a60357
Replace utf8/utf16 with 8bit/16bit
ltrzesniewski Oct 12, 2025
d478ea7
Split Native into Native8Bit/Native16Bit
ltrzesniewski Oct 15, 2025
c3ba718
Split InternalRegex
ltrzesniewski Oct 19, 2025
76e03f4
Merge tag 'v1.3.0' into utf8
ltrzesniewski Oct 25, 2025
ef8e612
Fix post-merge build
ltrzesniewski Oct 25, 2025
e9a26c7
Split InternalRegex
ltrzesniewski Nov 3, 2025
7476a1e
Generic ref Match (WIP callouts)
ltrzesniewski Nov 5, 2025
e00ade5
Add PcreMatchBufferUtf8 and callouts
ltrzesniewski Nov 14, 2025
3a4d649
Merge branch 'master' into utf8
ltrzesniewski Nov 14, 2025
59c3814
Cleanup
ltrzesniewski Nov 14, 2025
075d9b4
Add PcrePatternInfo support
ltrzesniewski Nov 14, 2025
29f4bb8
Move stuff around
ltrzesniewski Nov 16, 2025
613902f
Refactor native accessors
ltrzesniewski Nov 16, 2025
bd913b4
Implement standard exception constructors
ltrzesniewski Nov 16, 2025
77923b8
Add UTF-8 tests to PcreNetTests
ltrzesniewski Nov 16, 2025
919bc21
Add UTF-8 tests to PcreTests
ltrzesniewski Nov 16, 2025
19e9c5e
Looks like there was a GC hole
ltrzesniewski Nov 17, 2025
db0601b
v1.4.0-pre1
ltrzesniewski Nov 17, 2025
cba487d
Cleanup
ltrzesniewski Nov 17, 2025
60ca4e3
Merge branch 'master' into utf8
ltrzesniewski Nov 17, 2025
3eafe08
Update public API test
ltrzesniewski Nov 17, 2025
340d88d
Deduplicate code automatically POC
ltrzesniewski Nov 22, 2025
94d8e8d
Deduplicate code in inner types
ltrzesniewski Nov 22, 2025
783131b
Deduplicate other types
ltrzesniewski Nov 22, 2025
870562e
Use symbols and inheritdoc in generator
ltrzesniewski Nov 23, 2025
dcb6ebb
Remove even more duplicates
ltrzesniewski Nov 23, 2025
b602095
Fix build
ltrzesniewski Nov 23, 2025
26592f5
Revert "Remove even more duplicates"
ltrzesniewski Nov 23, 2025
d27ca2c
Simplify generator
ltrzesniewski Nov 23, 2025
a8015c1
Added a "8Bit" variant (which still uses UTF-8)
ltrzesniewski Nov 23, 2025
b49d995
Removed the UTF-8 enforcement
ltrzesniewski Nov 23, 2025
3e2a7cc
Add 8-bit tests and fix issues
ltrzesniewski Nov 23, 2025
2e153c0
Return encoding in PcreRegex8Bit
ltrzesniewski Nov 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@

PCRE.NET is a .NET wrapper for the [PCRE2 library](https://github.com/PCRE2Project/pcre2).

The library provides variants for UTF-16 (`string` and `ReadOnlySpan<char>`) and UTF-8 (`ReadOnlySpan<byte>`).

The following systems are supported:

- Windows x64
Expand Down Expand Up @@ -54,20 +56,32 @@ These methods return a `ref struct` type when possible, but are otherwise simila

This is the fastest matching API the library provides.

Call the `CreateMatchBuffer` method on a `PcreRegex` instance to create the necessary data structures up-front, then use the returned _match buffer_ for subsequent match operations. Performing a match through this buffer will not allocate further memory, reducing GC pressure and optimizing the process.
Call the `CreateMatchBuffer` method on a `PcreRegex` or `PcreRegexUtf8` instance to create the necessary data structures up-front, then use the returned _match buffer_ for subsequent match operations. Performing a match through this buffer will not allocate further memory, reducing GC pressure and optimizing the process.

The downside of this approach is that the returned match buffer is _not_ thread-safe and _not_ reentrant: you _cannot_ perform a match operation with a buffer which is already being used - match operations need to be sequential.

It is also counter-productive to allocate a match buffer to perform a single match operation. Use this API if you need to match a pattern against many subject strings.

`PcreMatchBuffer` objects are disposable (and finalizable in case they're not disposed). They provide an API for matching against `ReadOnlySpan<char>` subjects.
`PcreMatchBuffer` objects are disposable (and finalizable in case they're not disposed). They provide an API for matching against `ReadOnlySpan<char>` subjects. The same applies for `PcreMatchBufferUtf8` objects on `ReadOnlySpan<byte>` subjects

If you're looking for maximum speed, consider using the following options:

- `PcreOptions.Compiled` at compile time to enable the JIT compiler, which will improve matching speed.
- `PcreMatchOptions.NoUtfCheck` at match time to skip the Unicode validity check: by default PCRE2 scans the entire input string to make sure it's valid Unicode.
- `PcreOptions.MatchInvalidUtf` at compile time if you plan to use `PcreMatchOptions.NoUtfCheck` and your subject strings may contain invalid Unicode sequences.

### The UTF-8 API

`PcreRegexUtf8` objects handle UTF-8 text provided as `ReadOnlySpan<byte>`.

A Span API similar to the one mentioned above is provided, with the following methods:

- `Matches`
- `Match`
- `IsMatch`

There is also a zero-allocation API through the `CreateMatchBuffer` method.

### The DFA matching API

This API provides regex matching in O(_subject length_) time. It is accessible through the `Dfa` property on a `PcreRegex` instance:
Expand All @@ -79,6 +93,7 @@ You can read more about its features in [the PCRE2 documentation](https://pcre2p

## Library highlights

- Support for UTF-8 and UTF-16
- Support for compiled patterns (x86/x64/arm64 JIT)
- Support for partial matching (when the subject is too short to match the pattern)
- Callout support (numbered and string-based)
Expand Down
116 changes: 70 additions & 46 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,50 +20,74 @@ include_directories(PCRE/src)
include_directories(PCRE.NET.Native)

add_library(PCRE.NET.Native SHARED
PCRE/src/config.h
PCRE/src/pcre2.h
PCRE/src/pcre2_auto_possess.c
PCRE/src/pcre2_chartables.c
PCRE/src/pcre2_chkdint.c
PCRE/src/pcre2_compile.c
PCRE/src/pcre2_compile_cgroup.c
PCRE/src/pcre2_compile_class.c
PCRE/src/pcre2_config.c
PCRE/src/pcre2_context.c
PCRE/src/pcre2_convert.c
PCRE/src/pcre2_dfa_match.c
PCRE/src/pcre2_error.c
PCRE/src/pcre2_extuni.c
PCRE/src/pcre2_find_bracket.c
PCRE/src/pcre2_internal.h
PCRE/src/pcre2_intmodedep.h
PCRE/src/pcre2_jit_compile.c
PCRE/src/pcre2_maketables.c
PCRE/src/pcre2_match.c
PCRE/src/pcre2_match_data.c
PCRE/src/pcre2_match_next.c
PCRE/src/pcre2_newline.c
PCRE/src/pcre2_ord2utf.c
PCRE/src/pcre2_pattern_info.c
PCRE/src/pcre2_script_run.c
PCRE/src/pcre2_string_utils.c
PCRE/src/pcre2_study.c
PCRE/src/pcre2_substitute.c
PCRE/src/pcre2_substring.c
PCRE/src/pcre2_tables.c
PCRE/src/pcre2_ucd.c
PCRE/src/pcre2_ucp.h
PCRE/src/pcre2_valid_utf.c
PCRE/src/pcre2_xclass.c
PCRE.NET.Native/pcrenet.h
PCRE.NET.Native/pcrenet_compile.c
PCRE.NET.Native/pcrenet_convert.c
PCRE.NET.Native/pcrenet_info.c
PCRE.NET.Native/pcrenet_match.c
PCRE.NET.Native/pcrenet_substitute.c
)

add_compile_definitions(
HAVE_CONFIG_H
PCRE2_CODE_UNIT_WIDTH=16
PCRE.NET.Native/compile/pcre2_auto_possess.8bit.c
PCRE.NET.Native/compile/pcre2_auto_possess.16bit.c
PCRE.NET.Native/compile/pcre2_chartables.8bit.c
PCRE.NET.Native/compile/pcre2_chartables.16bit.c
PCRE.NET.Native/compile/pcre2_chkdint.8bit.c
PCRE.NET.Native/compile/pcre2_chkdint.16bit.c
PCRE.NET.Native/compile/pcre2_compile.8bit.c
PCRE.NET.Native/compile/pcre2_compile.16bit.c
PCRE.NET.Native/compile/pcre2_compile_cgroup.8bit.c
PCRE.NET.Native/compile/pcre2_compile_cgroup.16bit.c
PCRE.NET.Native/compile/pcre2_compile_class.8bit.c
PCRE.NET.Native/compile/pcre2_compile_class.16bit.c
PCRE.NET.Native/compile/pcre2_config.8bit.c
PCRE.NET.Native/compile/pcre2_config.16bit.c
PCRE.NET.Native/compile/pcre2_context.8bit.c
PCRE.NET.Native/compile/pcre2_context.16bit.c
PCRE.NET.Native/compile/pcre2_convert.8bit.c
PCRE.NET.Native/compile/pcre2_convert.16bit.c
PCRE.NET.Native/compile/pcre2_dfa_match.8bit.c
PCRE.NET.Native/compile/pcre2_dfa_match.16bit.c
PCRE.NET.Native/compile/pcre2_error.8bit.c
PCRE.NET.Native/compile/pcre2_error.16bit.c
PCRE.NET.Native/compile/pcre2_extuni.8bit.c
PCRE.NET.Native/compile/pcre2_extuni.16bit.c
PCRE.NET.Native/compile/pcre2_find_bracket.8bit.c
PCRE.NET.Native/compile/pcre2_find_bracket.16bit.c
PCRE.NET.Native/compile/pcre2_jit_compile.8bit.c
PCRE.NET.Native/compile/pcre2_jit_compile.16bit.c
PCRE.NET.Native/compile/pcre2_maketables.8bit.c
PCRE.NET.Native/compile/pcre2_maketables.16bit.c
PCRE.NET.Native/compile/pcre2_match.8bit.c
PCRE.NET.Native/compile/pcre2_match.16bit.c
PCRE.NET.Native/compile/pcre2_match_data.8bit.c
PCRE.NET.Native/compile/pcre2_match_data.16bit.c
PCRE.NET.Native/compile/pcre2_match_next.8bit.c
PCRE.NET.Native/compile/pcre2_match_next.16bit.c
PCRE.NET.Native/compile/pcre2_newline.8bit.c
PCRE.NET.Native/compile/pcre2_newline.16bit.c
PCRE.NET.Native/compile/pcre2_ord2utf.8bit.c
PCRE.NET.Native/compile/pcre2_ord2utf.16bit.c
PCRE.NET.Native/compile/pcre2_pattern_info.8bit.c
PCRE.NET.Native/compile/pcre2_pattern_info.16bit.c
PCRE.NET.Native/compile/pcre2_script_run.8bit.c
PCRE.NET.Native/compile/pcre2_script_run.16bit.c
PCRE.NET.Native/compile/pcre2_string_utils.8bit.c
PCRE.NET.Native/compile/pcre2_string_utils.16bit.c
PCRE.NET.Native/compile/pcre2_study.8bit.c
PCRE.NET.Native/compile/pcre2_study.16bit.c
PCRE.NET.Native/compile/pcre2_substitute.8bit.c
PCRE.NET.Native/compile/pcre2_substitute.16bit.c
PCRE.NET.Native/compile/pcre2_substring.8bit.c
PCRE.NET.Native/compile/pcre2_substring.16bit.c
PCRE.NET.Native/compile/pcre2_tables.8bit.c
PCRE.NET.Native/compile/pcre2_tables.16bit.c
PCRE.NET.Native/compile/pcre2_ucd.8bit.c
PCRE.NET.Native/compile/pcre2_ucd.16bit.c
PCRE.NET.Native/compile/pcre2_valid_utf.8bit.c
PCRE.NET.Native/compile/pcre2_valid_utf.16bit.c
PCRE.NET.Native/compile/pcre2_xclass.8bit.c
PCRE.NET.Native/compile/pcre2_xclass.16bit.c
PCRE.NET.Native/compile/pcrenet_compile.8bit.c
PCRE.NET.Native/compile/pcrenet_compile.16bit.c
PCRE.NET.Native/compile/pcrenet_convert.8bit.c
PCRE.NET.Native/compile/pcrenet_convert.16bit.c
PCRE.NET.Native/compile/pcrenet_info.8bit.c
PCRE.NET.Native/compile/pcrenet_info.16bit.c
PCRE.NET.Native/compile/pcrenet_match.8bit.c
PCRE.NET.Native/compile/pcrenet_match.16bit.c
PCRE.NET.Native/compile/pcrenet_substitute.8bit.c
PCRE.NET.Native/compile/pcrenet_substitute.16bit.c
)
2 changes: 1 addition & 1 deletion src/Directory.Build.props
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
</PropertyGroup>

<PropertyGroup>
<Version>1.3.0</Version>
<Version>1.4.0-pre1</Version>
</PropertyGroup>

<PropertyGroup Condition="'$(NCrunch)' == '1'">
Expand Down
2 changes: 1 addition & 1 deletion src/NuGetReadme.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

PCRE.NET is a .NET wrapper for the [PCRE2 library](https://github.com/PCRE2Project/pcre2).

**v1.3.0** is based on PCRE2 **v10.47**.
**v1.4.0-pre1** is based on PCRE2 **v10.47**.

The following systems are supported:

Expand Down
86 changes: 86 additions & 0 deletions src/PCRE.NET.InternalAnalyzers/CodeWriter.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
using System;
using System.Text;

namespace PCRE.NET.InternalAnalyzers;

internal class CodeWriter
{
private const int _indentWidth = 4;

private readonly StringBuilder _sb = new();
private bool _isAtStartOfLine = true;

public int Indent { get; set; }

public CodeWriter Append<T>(T? value)
{
Append(value?.ToString());
return this;
}

public CodeWriter Append(string? value)
{
if (!string.IsNullOrEmpty(value))
{
WriteIndent();
_sb.Append(value);
}

return this;
}

public CodeWriter AppendLine<T>(T? value)
{
AppendLine(value?.ToString());
return this;
}

public CodeWriter AppendLine(string? value = null)
{
if (!string.IsNullOrEmpty(value))
{
WriteIndent();
_sb.Append(value);
}

_sb.AppendLine();
_isAtStartOfLine = true;

return this;
}

public override string ToString()
=> _sb.ToString();

public BlockScope WriteBlock()
{
EnsureIsOnNewLine();
AppendLine("{");
Indent++;
return new BlockScope(this);
}

private void EnsureIsOnNewLine()
{
if (!_isAtStartOfLine)
AppendLine();
}

private void WriteIndent()
{
if (_isAtStartOfLine)
_sb.Append(' ', Indent * _indentWidth);

_isAtStartOfLine = false;
}

public readonly struct BlockScope(CodeWriter writer) : IDisposable
{
public void Dispose()
{
writer.EnsureIsOnNewLine();
writer.Indent--;
writer.AppendLine("}");
}
}
}
Loading
Loading