Skip to content

Benchmarks

xkqg edited this page Apr 23, 2026 · 10 revisions

Benchmarks

Measured on AMD Ryzen 9 3950X (16C/32T, 3.49 GHz base), Windows 11 24H2, .NET 10.0.6, X64 RyuJIT AVX2 / x86-64-v3, BenchmarkDotNet 0.14 / 0.15, Release mode. The authoritative report with full per-suite tables lives at BENCHMARKS.md in the repo root.


v1.9.0 — Indicator expansion (no hot-path change)

v1.9.0 adds 12 indicators across Tier 3a/b/c (52 total) and does not touch any rendering path — every SVG/indicator/numerics number published at v1.7 / v1.8 is still current. A dedicated Tier3IndicatorBenchmarks suite will land post-v1.9.0.

Headline numbers (Ryzen 9 3950X, 100 000-point series unless noted):

Scenario Time Allocation
SVG render a 1 000-point line 66 µs 127 KB
SVG render with LTTB downsample (100 000 pts) 1.78 ms 2.4 MB
SIMD Vec.Sum / Vec.Mean on 100 000 pts 19 µs 0 B
SMA(20) on 100 000 pts 196 µs 781 KB
VWAP on 100 000 pts 204 µs 781 KB
EquityCurve on 100 000 pts 189 µs 781 KB
JSON round-trip (Figure ⇄ JSON) 40 µs 19.6 KB
PNG export via SkiaSharp 27 ms 88 KB
TransformBatch (AVX) on 100 000 pts 208 µs 1.5 MB

See BENCHMARKS.md for full per-suite tables, allocation breakdowns, and historical v0.5 → v1.1 comparisons.


v1.7.0 — Streaming, MathText, Themes, Rendering

Benchmark Result Notes
RingBuffer.Append 40M ops/sec Single-writer, ReaderWriterLockSlim, 100K capacity
RingBuffer.ToArray (10K snapshot) 190K snapshots/sec 10K-element copy per snapshot
StreamingLine 100K appends + snapshot 13ms total Append 100K points + create snapshot
MathText parse \sum_{i=0}^{n} \frac{1}{i!} = e 473K parses/sec Operator limits + fraction in single expression
MathText parse \begin{pmatrix} a & b \\ c & d \end{pmatrix} 554K parses/sec 2×2 matrix environment
26 theme presets loaded 370µs All 26 themes instantiated from static properties
SVG render (1K-point line chart) 1.12ms Full pipeline: data range → ticks → axes → series → SVG string
SVG render (3D surface + rotation script) 1.22ms Includes Projection3D, depth sort, JS injection
Natural Earth 110m coastlines loaded 5ms 134 features parsed from embedded GeoJSON
Natural Earth 110m countries loaded 48ms 177 features (cached after first load)

Geographic projection throughput (100K forward projections)

Projection ops/sec Notes
Sinusoidal 83M/sec Simplest — single cosine
NaturalEarth 77M/sec Polynomial — no trig
Robinson 33M/sec Table interpolation
AlbersEqualArea 29M/sec Conic
Mercator 25M/sec Log tangent
Stereographic 24M/sec Azimuthal
EqualEarth 19M/sec Modern polynomial
Orthographic 17M/sec Globe view
PlateCarree 16M/sec Identity (overhead is the method call)
LambertConformal 16M/sec Conic
TransverseMercator 14M/sec Rotated cylindrical
AzimuthalEquidistant 13M/sec Acos + trig
Mollweide 6M/sec Newton iteration (slowest)

All projections exceed 5M projections/sec — projecting all 177 countries takes < 1ms even with Mollweide.

Streaming performance targets

Scenario Buffer Append rate Render FPS Memory
Dashboard 1K 1/sec 1 16 KB
Telemetry 10K 100/sec 30 640 KB
Oscilloscope 100K 10K/sec 60 800 KB
Trading (OHLC) 5K 10/sec 10 160 KB

With 40M appends/sec and 190K snapshots/sec, the ring buffer is never the bottleneck — rendering is. At 30fps (33ms budget), each frame has ~32ms for rendering after the sub-microsecond snapshot.


Historical Benchmarks

v1.1.1 — Series Benchmarks

PolarHeatmapSeries rendering and the NumPy-style numeric operations added in v1.1.1 are not yet in the benchmark suite. The polar heatmap renders via the same 12-segment polygon path as PolarBarSeries — expected render time is comparable to the existing PolarLine (33 µs) given similar polygon counts per chart.

Run the full suite to establish a baseline:

dotnet run -c Release --project Benchmarks/MatPlotLibNet.Benchmarks -- --filter "*"

v1.1.1 — DataFrame Benchmarks

New DataFrameBenchmarks class covers every public method in MatPlotLibNet.DataFrame — column reader, all 16 financial indicators, polynomial numerics, and figure builder extensions (with and without hue grouping).

27 benchmarks across 5 groups, 3 data sizes (1K / 10K / 100K rows):

Group Benchmarks
Column reader ToDoubleArray (baseline), ToStringArray
Price indicators SMA, EMA, RSI, MACD, BollingerBands, DrawDown, OBV
OHLCV indicators ATR, ADX, ADXFull, CCI, Stochastic, WilliamsR, KeltnerChannels, VWAP, ParabolicSAR
Numerics PolyFit(deg 3), PolyEval(deg 3), ConfidenceBand(95%)
Figure builders Line, Scatter, Hist (plain + with 3-group hue split)

Key insight: hue grouping overhead is measurable by comparing Line_Close vs Line_WithHue — isolates the HueGrouper cost from the render cost.

dotnet run -c Release --project Benchmarks/MatPlotLibNet.Benchmarks -- --filter "*DataFrame*"
dotnet run -c Release --project Benchmarks/MatPlotLibNet.Benchmarks -- --filter "*DataFrame*Indicator*"
dotnet run -c Release --project Benchmarks/MatPlotLibNet.Benchmarks -- --filter "*DataFrame*Hue*"

v1.1.0 — New Benchmarks

Added benchmarks for one new series type and a SIMD improvement. (The GeoMap_Equirectangular and Choropleth_Viridis benchmarks from v1.1.0 were removed in v1.1.4 along with the Geo/Map subsystem itself — see the Roadmap for rationale.)

Benchmark Time Allocated
Surface3D_WithLighting — 10×10 grid + directional light 82 µs 148 KB

VectorMath.SplitPositiveNegative — replaced per-element branching with two TensorPrimitives.Max/Min SIMD passes. Faster for all spans > ~16 elements on AVX2 hardware.

dotnet run -c Release --project Benchmarks/MatPlotLibNet.Benchmarks -- --filter *Surface3D_WithLighting* --memory

Architecture — Why Server-Side SVG?

MatPlotLibNet renders charts server-side as SVG and delivers them to clients. No JavaScript chart library on the client — the browser just swaps innerHTML.

Benefit Detail
Zero client-side cost Browser swaps innerHTML — no canvas redraws, no layout recalculation
Inline SVG Part of the DOM — styleable via CSS, accessible to screen readers, prints as vector
Consistent Every client sees the exact same chart, no browser rendering differences
Bandwidth-efficient Typical chart SVG is 5–15 KB; SignalR pushes only changed charts
Scales with hardware Parallel subplot rendering uses all available cores

Coordinate Transform (DataTransform)

The hot path: data space → pixel space. v0.6.0 replaced a two-pass TensorPrimitives approach with a single-pass AVX SIMD interleaveVector256.Multiply + Add (FMA when available) → Avx.UnpackLow/HighAvx.Permute2x128 → direct store via MemoryMarshal.Cast. Scalar fallback on non-x86.

Size v0.5.1 v0.6.0 Speedup Alloc reduction
1K pts 9 µs 764 ns 11.8×
10K pts 124 µs 53 µs 2.3×
100K pts 1,298 µs / 3,047 KB 208 µs / 1,563 KB 6.2×

Every LineSeries, ScatterSeries, AreaSeries, and BubbleSeries renderer uses this path — all indicator output benefits automatically.


SVG Rendering

Chart Time Allocated
Simple line (100 pts) 94 µs 136 KB
Line + scatter + bar 109 µs 133 KB
3×3 subplot grid 754 µs 933 KB
Treemap (6 nodes) 60 µs 109 KB
Sunburst (4 nodes, depth 2) 65 µs 118 KB
Sankey (4 nodes, 4 links) 63 µs 118 KB
Polar line (50 pts) 33 µs 56 KB
3D surface (10×10) 69 µs 124 KB
3D surface (10×10) + directional lighting 82 µs 148 KB
Line + legend (3 series) 140 µs 214 KB
Large line (10K pts) 3,105 µs 3,714 KB
Large line (100K pts, LTTB→2K) 1,332 µs 2,429 KB

LTTB downsampling makes 100K-point charts faster than full-resolution 10K charts.


Technical Indicators (100K data points)

Phase F indicators

At 100K points (a full trading day at 1-second bars), every indicator completes in under 3.3 ms. Multiple indicators run in parallel on separate cores.

Indicator v0.5.1 v0.6.0 Note
SMA(20) 196 µs 195 µs Sliding sum
EMA(20) 496 µs 491 µs Sequential
RSI(14) 851 µs 892 µs
VWAP 212 µs 238 µs
EquityCurve 349 µs 226 µs CumulativeSum + Linspace
BollingerBands(20) 2,016 µs 2,231 µs SIMD inner loop
MACD(12,26,9) 1,574 µs 1,495 µs
ADX(14) 2,609 µs 2,434 µs
Stochastic(14,3) 7,669 µs 3,308 µs 2.3× — O(n*p) → O(n) monotone deque

New indicators (v0.6.0) at 100K

Indicator Time Allocated
OBV 645 µs 781 KB
ParabolicSAR 1,211 µs 879 KB
CCI(20) 2,159 µs 2,344 KB
WilliamsR(14) 2,972 µs 3,125 KB

Vec SIMD Operations

Vec is a readonly record struct wrapping double[] with SIMD-accelerated operators via TensorPrimitives.

Element-wise (allocates result array)

Operation 1K 10K 100K
a + b 452 ns 3.8 µs 120 µs
a × scalar 386 ns 2.9 µs 120 µs
(a+b)×1.5−b 1.3 µs 11 µs 447 µs
Std 803 ns 8.4 µs 172 µs

Reductions (zero allocation)

Operation 1K 10K 100K
Sum 164 ns 1.8 µs 18 µs
Mean 166 ns 1.8 µs 18 µs
Min 434 ns 4.5 µs 44 µs
Max 335 ns 3.4 µs 34 µs

Reductions are zero-alloc and ~6× faster than element-wise ops at 100K.


JSON Serialization

Round-trip under 50 µs → >20,000 chart specs/sec on a single core.

Method Time Allocated
ToJson 26 µs 8 KB
FromJson 21 µs 12 KB
Round-trip 41 µs 20 KB

PNG / PDF Export (SkiaSharp)

Dominated by SkiaSharp rasterization. Suited for batch export, not real-time streaming.

Method Time Allocated
PNG (simple) 27 ms 88 KB
PNG (complex) 22 ms 81 KB
PDF (simple) 47 ms 3,925 KB
PDF (complex) 47 ms 3,922 KB

Running the Benchmarks Yourself

cd Benchmarks/MatPlotLibNet.Benchmarks

dotnet run -c Release -- --filter "*SvgRendering*"
dotnet run -c Release -- --filter "*DataTransform*"
dotnet run -c Release -- --filter "*Indicator*"
dotnet run -c Release -- --filter "*VectorMath*"
dotnet run -c Release -- --filter "*Serialization*"
dotnet run -c Release -- --filter "*SkiaExport*"
dotnet run -c Release -- --filter "*"    # all suites

Run one suite at a time — concurrent benchmark runs inflate timings due to CPU contention.

Clone this wiki locally