Skip to content

Experiment of rewriting in Ruby#218

Draft
yosiat wants to merge 62 commits into
masterfrom
ruby-impl-perf
Draft

Experiment of rewriting in Ruby#218
yosiat wants to merge 62 commits into
masterfrom
ruby-impl-perf

Conversation

@yosiat

@yosiat yosiat commented Mar 30, 2026

Copy link
Copy Markdown
Owner

No description provided.

@yosiat

yosiat commented Mar 30, 2026

Copy link
Copy Markdown
Owner Author

Performance Analysis: Ruby Implementation vs C Extension

Benchmarks run on Ruby 4.0.2 (2026-03-17, +PRISM, +YJIT, arm64-darwin25), Rails 8.0.
Both master and ruby-impl-perf run with YJIT enabled (Ruby 4.0 default).


panko_json — JSON string output (primary benchmark)

Throughput (iterations/sec, higher is better)

Benchmark Master (C ext) Ruby impl Change
Simple, 50 posts 38,566 54,443 +41%
Simple, 2300 posts 862 1,434 +66%
HasOne, 50 posts 21,771 32,777 +51%
HasOne, 2300 posts 500 842 +68%
HasMany, 50 authors 18,086 20,404 +13%
HasMany, 2300 authors 409 500 +22%
MethodCall, 50 posts 40,162 44,360 +10%
MethodCall, 2300 posts 937 1,128 +20%
JSON column, 50 posts 21,386 25,227 +18%
JSON column, 2300 posts 476 595 +25%
Except, 50 posts 23,269 35,347 +52%
Except, 2300 posts 541 892 +65%
Only, 50 posts 28,035 46,516 +66%
Only, 2300 posts 624 1,253 +101%
Aliases, 50 posts 36,130 51,859 +44%
Aliases, 2300 posts 847 1,364 +61%

Allocations (allocs / retained)

Benchmark Master (C ext) Ruby impl
Simple, 50 122 / 0 35 / 4
Simple, 2300 4,620 / 0 35 / 4
HasOne, 50 128 / 0 63 / 7
HasOne, 2300 4,628 / 0 62 / 6
HasMany, 50 558 / 100 438 / 106
HasMany, 2300 24,928 / 4,500 18,058 / 4,506
MethodCall, 50 22 / 0 37 / 2
MethodCall, 2300 22 / 0 37 / 2
JSON column, 50 170 / 0 86 / 5
JSON column, 2300 6,920 / 0 2,335 / 5
Except, 50 131 / 0 66 / 6
Except, 2300 4,631 / 0 66 / 6
Only, 50 31 / 0 66 / 4
Only, 2300 31 / 0 66 / 4
Aliases, 50 120 / 0 35 / 4
Aliases, 2300 4,620 / 0 35 / 4

panko_object — Ruby Hash/Array output

Benchmark Master (C ext) Ruby impl Change
Simple, 50 posts 29,576 43,133 +46%
Simple, 2300 posts 627 1,031 +64%
HasOne, 50 posts 16,323 25,126 +54%
HasOne, 2300 posts 360 599 +66%
Except, 50 posts 17,247 25,951 +50%
Except, 2300 posts 389 626 +61%
Only, 50 posts 19,935 32,830 +65%
Only, 2300 posts 453 817 +80%
Aliases, 50 posts 29,659 42,098 +42%
Aliases, 2300 posts 628 1,032 +64%

plain_object — non-AR objects

Benchmark Master (C ext) Ruby impl Change
Simple, 50 posts 43,619 43,107 −1%
Simple, 2300 posts 1,022 1,030 +1%
HasOne, 50 posts 32,185 30,938 −4%
HasOne, 2300 posts 752 724 −4%
MethodCall, 50 posts 61,258 44,904 −27%
MethodCall, 2300 posts 1,510 1,049 −31%
Except, 50 posts 33,668 32,577 −3%
Except, 2300 posts 829 797 −4%
Only, 50 posts 54,527 49,843 −9%
Only, 2300 posts 1,370 1,220 −11%

object_writer — Oj::StringWriter layer

Benchmark Master (C ext) Ruby impl Change
1 prop, push_value 9,484,472 9,390,096 −1%
2 props, push_value 8,015,880 7,960,216 −1%
1 prop, push_key+push_value 8,988,532 9,030,661 0%
2 props, push_key+push_value 7,172,322 7,388,748 +3%
Nested object 5,665,602 5,711,066 +1%

type_casts — selected highlights

Type Master (C ext) Ruby impl Change
String TypeCast 22,627,776 18,459,316 −18%
String NoTypeCast 56,552,541 31,026,084 −45%
Integer TypeCast 39,958,641 24,070,938 −40%
Integer NoTypeCast 56,278,885 32,497,737 −42%
Float TypeCast 29,732,308 20,437,117 −31%
Float NoTypeCast 50,937,768 33,383,268 −34%
Boolean TypeCast 40,481,193 5,393,576 −87%
Boolean NoTypeCast 55,342,118 32,013,417 −42%
DateTime TypeCast 24,132,573 10,609,659 −56%
DateTime NoTypeCast 10,735,715 9,903,270 −8%
Date TypeCast 2,755,819 2,672,488 −3%
Date NoTypeCast 14,129,258 12,562,597 −11%
Decimal TypeCast 3,050,253 2,977,445 −2%
Decimal NoTypeCast 6,254,275 6,405,401 +2%
Json TypeCast 3,243,300 3,248,083 0%
Json NoTypeCast 54,932,784 15,904,468 −71%

components — Ruby impl internal benchmarks (no C ext equivalent)

Component ips
Filters no-op 1,172,151
Filters :only array 239,871
Filters :only hash (nested) 152,569
RecordState fast path (same batch) 9,935,600
RecordState full path (same class) 9,728,526
RecordState class change 66,959
Descriptor no filters 1,049,730
Descriptor attribute filter 399,191
Descriptor association filters 149,010

Summary

The Ruby implementation is faster than the C extension for all ActiveRecord serialization. On Ruby 4.0 with YJIT (the default), panko_json throughput improves +10% to +101% across every benchmark. The largest gains are on filtered queries: Only +101%, HasOne +68%, Except +65%, Simple +66% at scale. Even the smallest gain (MethodCall, 50 posts) is still +10%.

panko_object sees even larger gains (+42% to +80%). The Ruby Hash/Array output path benefits from the same YJIT optimizations. Only +80% on 2300 records is the standout.

plain_object is a regression. Non-AR serialization is −1% to −31% slower, with MethodCall taking the biggest hit (−27% to −31%). Simple attribute access is flat (±1%). This is the one area where the C extension still wins.

object_writer is unchanged (~0%). Expected — ObjectWriter is a pure Ruby stack-based Hash/Array builder that is identical in both branches.

Type cast microbenchmarks are slower in pure Ruby. Individual type conversions run 2–87% slower, with Boolean TypeCast (−87%) and Json NoTypeCast (−71%) being the outliers. However, these microbenchmarks measure isolated type conversions at millions of iterations/sec — the overhead is invisible in end-to-end serialization where the Ruby impl is faster overall.

Allocations are dramatically lower. The Ruby impl now allocates significantly fewer objects than the C extension in most benchmarks. For example, Simple 2300 drops from 4,620 to 35 allocations — a 99% reduction. Except 2300 drops from 4,631 to 66. HasMany 2300 drops from 24,928 to 18,058. The only cases where the Ruby impl allocates more are MethodCall (37 vs 22) and Only (66 vs 31), both small in absolute terms. This is a major improvement over the previous iteration where the Ruby impl allocated ~50% more objects.

@yosiat yosiat force-pushed the ruby-impl-perf branch 3 times, most recently from 593c73d to 8be0e83 Compare March 30, 2026 20:51
@rus-max

rus-max commented Mar 31, 2026

Copy link
Copy Markdown

📊 Benchmarks: Ruby 4.0.1, Rails 8.1.0
Base: Master without YJIT = 100%

Benchmark Master (no YJIT) Master (YJIT) PR (no YJIT) PR (YJIT)
Simple 2300 100% 105% (+5%) 70% (−30%) 147% (+47%)
Simple 50 100% 107% (+7%) 68% (−32%) 133% (+33%)
HasOne 2300 100% 138% (+38%) 88% (−12%) 215% (+115%)
HasOne 50 100% 139% (+39%) 85% (−15%) 204% (+104%)

• Master + YJIT: 5-39% speedup
• PR without YJIT: 12-32% slower
• PR + YJIT: 33-115% faster 🚀

@yosiat

yosiat commented Mar 31, 2026

Copy link
Copy Markdown
Owner Author

@rus-max thanks for this comment.. I was pretty sure that the benchmark auto enable YJIT but this is the not the case.. I updated the performance comments and the results are impressive. Hopefully soon I'll be able to release the ruby version :)

@rus-max

rus-max commented Apr 1, 2026

Copy link
Copy Markdown

@yosiat Allocation benchmarks show no difference between Ruby impl and Ruby+YJIT.

@yosiat

yosiat commented Apr 1, 2026

Copy link
Copy Markdown
Owner Author

@rus-max I am not too familiar with YJIT internals, but I don't expect it to reduce allocations.
I am right now working on cleaning up all benchmarks files and simplify them, later I'll work on allocations, I think I have easy wins here.

@yosiat yosiat force-pushed the ruby-impl-perf branch 12 times, most recently from d99c926 to 0d6f037 Compare April 5, 2026 11:30
yosiat added 9 commits April 11, 2026 16:11
…fter first call. +3.6% IPS

Result: {"status":"keep","total_ips":13443.41,"total_allocs":4818}
…aseline! Simple 2300: 209 ips (was 175)

Result: {"status":"keep","total_ips":15600.3,"total_allocs":4818}
…er baseline! Simple 2300: 265 ips

Result: {"status":"keep","total_ips":18463.56,"total_allocs":4818}
…11% over baseline. 14422 vs 12978

Result: {"status":"keep","total_ips":14421.88,"total_allocs":4818}
…allocs unchanged at 4818

Result: {"status":"keep","total_ips":17167.58,"total_allocs":4818}
…6% over baseline

Result: {"status":"keep","total_ips":17339.66,"total_allocs":4818}
…one call. +40.5% over baseline

Result: {"status":"keep","total_ips":18243.27,"total_allocs":4818}
yosiat added 28 commits April 11, 2026 16:11
The 5-argument form of bytesplice was added in Ruby 3.3. Fall back to
the 3-argument form with an explicit byteslice to support Ruby 3.2.
… C extension fallback

Deletes the unused Context class from context.rb, the @Aliases attr_accessor
and apply_fields_filters method from SerializationDescriptor, the stale
aliases={} init in Serializer#inherited, and the C extension fallback branch
in Impl::Serializer#write_fields that referenced non-existent Panko._sd_set_writer
and Panko._write_attributes methods. Updates the one spec that asserted on aliases.
Move apply_filters, resolve_filters, apply_attribute_filters, and
apply_association_filters out of SerializationDescriptor into a new
Panko::Filters class backed by a single frozen instance (INSTANCE).
SerializationDescriptor#apply_filters becomes a one-liner delegation.
Adds spec/unit/panko/filters_spec.rb with full unit coverage.
- Remove @record_class ivar and all AR alias-resolution logic from Attribute
- Simplify invalidate! to take no arguments (just clears @type and @cached_writer)
- Move AR attribute_aliases resolution into ActiveRecord::Writer where it belongs
- Expose alias_name= writer (attr_accessor) and make name= public
- Add spec/unit/panko/attribute_spec.rb covering the new behaviour
Extract all record-level state (column_indexes, row, is_indexed_row,
attributes_hash, types, additional_types, values, etc.) from the
252-line Writer god object into a new RecordState class.

RecordState owns:
- setup(object) — handles the IndexedRow fast path (identity check on
  column_indexes) and the full initialization path; returns true when
  the record class changes so the caller knows to invalidate attributes
  and re-resolve AR aliases
- read_attribute(attribute) — non-indexed (Rails 7.x) value lookup with
  dirty-attributes-hash priority

Writer becomes a thin orchestrator: calls record_state.setup, handles
attribute invalidation and alias resolution on class change, then
delegates into the existing indexed-row fast paths using state exposed
via record_state accessors.

Adds spec/unit/panko/impl/record_state_spec.rb with 29 unit examples
covering initialization defaults, setup return values, fast-path
detection, class-change tracking, and read_attribute priority logic.
…ngine

The Impl namespace was a historical artifact from when this code was a
direct port of the C extension. Now that it is idiomatic Ruby, Engine
better reflects its role: the internal hot-path machinery that powers
serialization, distinct from the public Panko::* API surface.

- Rename lib/panko/impl/ → lib/panko/engine/ (git mv)
- Rename spec/unit/panko/impl/ → spec/unit/panko/engine/ (git mv)
- Replace all Panko::Impl → Panko::Engine references across lib/, spec/,
  benchmarks/, and panko_serializer.rb
- Update CLAUDE.md to reflect new paths and namespace
- Add .DS_Store to .gitignore
Replace 29 benchmark files with a clean 12-file structure:
- support/benchmark.rb: ~200 LOC infra with benchmark() and
  benchmark_with_records() API, BENCH=/SIZE=/PROFILE= env vars, YJIT
- support/setup.rb: SQLite in-memory DB with 2300 seed records
- support/datasets.rb: 4 reusable datasets (posts, authors, aliased, plain)
- 5 benchmark files: panko_json, panko_object, plain_object, object_writer,
  components (Filters, RecordState, SerializationDescriptor)
- type_casts/: per-provider files (generic, postgresql, mysql, sqlite)

Simplify Rakefile from PTY/JSON runner to simple system() calls.
Remove active_model_serializers and terminal-table dependencies.
… allocations

On Ruby 3.3+, bytesplice(dst_off, dst_len, src, src_off, src_len) copies
directly without allocating intermediate byteslice strings. This removes
~6,900 object allocations and ~276KB per 2,300-record serialization run.
Falls back to the 3-arg bytesplice + byteslice path on Ruby < 3.3.
association().target returns nil when the association hasn't been loaded
yet. Check loaded? first and fall back to public_send for lazy loading.
Break the monolithic 180-line method into 9 small methods with clear
responsibilities: write_attributes (dispatcher), handle_class_change,
resolve_type, write_value, write_indexed_with_hash, write_indexed_cached,
write_indexed_first_pass, write_non_indexed, and build_column_caches.

Add nil_safe_push? to all value writers, replacing the is_a? chain in
build_column_caches. Expose last_record_class on RecordState.
…. serialize)

The subtype branch in ValuesWriter::Writer#write never cached a writer on
the attribute, so the second record in a batch hit write_indexed_cached
with a nil writer_cache entry and raised NoMethodError.

Introduce SubtypeWriter to wrap the AR type and cache it like every other
writer. Also removes stale TODO comments from attribute.rb and writer.rb.
- design-choices.md: Replace C extension references with Ruby Engine
  architecture, document fast paths, value writers, and IndexedRow
- performance.md: Update benchmarks to Ruby 4.0.2/Rails 8.0, simplify
  to core JSON benchmarks, add instructions for running locally
Passing non-Array/Hash values to :only/:except now raises ArgumentError
instead of NoMethodError, giving callers a clear message.
…imeWriter buffer corruption

The global @@writer singleton shared a DateTimeWriter with a mutable @buf
across all threads. Under Puma, concurrent requests could corrupt datetime
output by writing to the same buffer simultaneously. Each thread now gets
its own Writer instance via Thread.current.
…ociations

When a serializer declares has_one pointing to a plain method on the model
(not a real AR association), object.association() raises
AssociationNotFoundError. Rescue and fall back to public_send, matching
has_many behavior which already uses public_send directly.
…ocal caching

When Serializer.new is called without :only/:except/:context/:scope options,
SerializationDescriptor.build now returns a thread-local cached copy instead
of duplicating the descriptor on every call. This avoids allocating ~8 objects
per call (descriptor, 2 array dups, serializer instance, 3 association
duplicates with recursive sub-descriptors).

Each thread gets its own duplicate (created once, reused forever), keeping
the mutable per-call state (serializer @object, association Writer/RecordState)
thread-safe under Puma.

Also fixes a latent bug in RecordState#setup where the IndexedRow fast path
could crash when a reused RecordState transitions from an IndexedRow-backed
object to a non-indexed object.

Benchmark (panko_oj, single Game object, Rails 8.0):
  Unpersisted: 119k → 190k i/s (+59%)
  Persisted:    96k → 171k i/s (+78%)

Allocations per call: 79 → 23 objects (-71%)
Instead of creating a new Engine::Serializer on every serialize_to_json /
to_json call, cache it on the descriptor via engine_serializer. Since
descriptors are thread-local (from the previous commit), the cached engine
is also per-thread and safe to reuse.

Both write_fields and _serialize_many validate that the cached
attributes_writer matches the current object type via
AttributesWriter.writer_for, handling the edge case where a reused engine
encounters a different object type (e.g. AR model then Hash).

Also refactors AttributesWriter.create into writer_for (returns class) and
create (instantiates), enabling cheap is_a? checks without allocation.

Benchmark (panko_oj, single Game object, Rails 8.0):
  Unpersisted: 190k → 225k i/s (+18%)
  Persisted:   171k → 213k i/s (+25%)

Cumulative from baseline:
  Unpersisted: 120k → 225k i/s (+88%)
  Persisted:    96k → 213k i/s (+122%)

panko_json (array serialization): no change — one engine per batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants