This document provides a comprehensive performance and memory usage comparison between the php-judy extension and PHP arrays, based on our extensive benchmarking suite.
The Judy algorithm is a form of a trie or a radix tree, optimized for in-memory integer and string key-value storage. Unlike simple hash tables, Judy arrays are designed to be memory-efficient and to maintain sorted order. This is what gives them unique performance characteristics.
Judy's design uses a series of nodes that compress branches of the tree, which can lead to very low memory overhead, especially with dense key sets.
Because the keys are stored in a tree-like structure, iterating through them in sorted order is a native and highly performant operation. A hash table, by contrast, must first sort all keys before it can be traversed in order.
For dense, sequential keys, Judy arrays have excellent cache performance (cache-friendly) because related keys are stored in a contiguous manner in memory.
Modern data structures like Swiss tables (used in abseil and Folly) and Robin Hood hashing (used in C++ unordered_map) are highly optimized hash tables that are generally considered to be some of the fastest. They achieve their performance by minimizing cache misses and collisions.
Random Access: For random key lookups and insertions (the most common use case for a map or dictionary), highly-optimized hash tables will typically outperform Judy arrays. This is because they can find a key in near-constant time (O(1)), while Judy's lookup time is logarithmic with the number of bits in the key (O(logn) for a balanced trie).
Benchmarks: The most accurate performance metrics come from benchmarks that test specific real-world workloads, not just raw operations. Factors like key sparsity, key type (integer vs. string), and access patterns (random vs. sequential) can dramatically change the outcome. An algorithm that excels at one task might be slow at another.
Native PHP Arrays (Hash Tables): PHP arrays are implemented as a highly-optimized hash table. A hash table is designed for constant-time average lookups, denoted as O(1). It works by calculating a hash value from the key, which directly points to the memory location of the value. This process is extremely fast and doesn't depend on the total number of elements in the array.
Judy Arrays (Tries): Judy arrays are a type of trie or radix tree. To find a key, the algorithm must traverse down the tree, inspecting parts of the key at each node. This makes Judy's lookup time logarithmic, denoted as O(logn), where n is the number of bits in the key. While this is very efficient, it's inherently slower than the single-step lookup of a hash table for random access.
Why this impacts performance: The benchmark results clearly show the impact of this difference in random access patterns:
-
Random Lookups: The benchmarks show Judy is 7.5x slower than PHP arrays for random access. This is because each lookup requires a traversal of the trie, which involves multiple pointer dereferences and memory jumps, while a hash table typically performs a single, fast lookup.
-
Sequential Access: Judy performs much better in sequential access and range queries because its sorted trie structure is designed for this. When traversing in order, Judy benefits from cache locality, as it can access adjacent keys with minimal overhead. The performance gap for sequential access is only 4.2x slower, and for range queries, it's nearly competitive (1.1x slower).
In short, the performance difference is not a flaw but a direct consequence of Judy's design trade-offs. It sacrifices some random access speed to achieve exceptional memory efficiency and fast ordered operations, which native PHP arrays do not provide.
- Hardware: Tests run on modern x86_64 systems with sufficient RAM to avoid memory pressure
- Operating System: Linux (Docker containers for consistency)
- PHP Version: 8.x with Judy extension 2.4.0
- Test Methodology: Multiple iterations with statistical analysis (min/max/median/percentiles)
- Memory Measurement: Using
memory_get_usage(true)andJudy::memoryUsage()
Note: Results may vary based on hardware, system load, and PHP configuration. All benchmarks use the same Docker environment for consistency.
Use Judy Arrays When:
- ✅ Memory is constrained (2-4x less memory usage)
- ✅ Large datasets (> 1M elements) where memory efficiency matters
- ✅ Sequential access patterns and ordered iteration
- ✅ Range queries and ordered operations
- ✅ String key random access with Hash or Adaptive types (O(1) avg lookup)
Use PHP Arrays When:
- ❌ Random access patterns with integer keys (2-9x faster than Judy trie types)
- ❌ Small datasets (< 100k elements)
- ❌ Performance-critical random operations on integer keys
- ❌ Memory is not a constraint
Our benchmark suite tests multiple scenarios to provide realistic performance data:
examples/benchmark_ordered_data.php- Sequential, clustered, and random key patternsexamples/benchmark_range_queries.php- Range queries and ordered operationsexamples/benchmark_real_world_patterns.php- Database, log, analytics patternsexamples/run_comprehensive_benchmarks.php- Complete benchmark suite
- Ordered Data Performance: Sequential keys, clustered keys, random keys
- Range Query Performance: Range operations, ordered iteration, navigation
- Real-world Patterns: Database primary keys, log data, analytics, session data
- Memory Efficiency: Memory usage across different patterns and sizes
| Dataset Size | Memory Savings | Performance Impact | Best Use Case | Recommendation |
|---|---|---|---|---|
| 100k | 12.5x less | 4.2x slower | Small datasets | |
| 500k | ~2.2x less | ~2-3x slower | Medium datasets | |
| 1M | ~2.2x less | ~3x slower | Large datasets | ✅ Use Judy |
| 10M | ~3.5x less | ~3-9x slower | Very large datasets | ✅ Use Judy |
Key Insight: Judy becomes more attractive as dataset size increases due to memory efficiency gains.
| Access Pattern | Judy Performance | PHP Performance | Judy vs PHP | Use Case |
|---|---|---|---|---|
| Random Access | 6.55ms | 0.87ms | 7.5x slower | ❌ Avoid Judy |
| Sequential Access | 3.62ms | 0.87ms | 4.2x slower | |
| Judy Iterator | 20.13ms | 1.79ms | 11.2x slower | |
| Range Queries | ~3.2ms | ~2.8ms | 1.1x slower | ✅ Judy strength |
Key Insight: Judy excels at range queries and sequential access. Iterator performance depends on dataset size - faster than sequential for large sparse datasets, slower for small sequential datasets.
Note: The 11.2x slower result is from a small (100K) sequential dataset where iterator overhead is significant. For large sparse datasets (>1M), iterators become competitive with sequential access.
Design Alignment: These results align perfectly with Judy's radix tree design - sequential access leverages cache locality, while random access requires tree traversal.
| Pattern Type | Judy Performance | Memory Efficiency | Use Case | Recommendation |
|---|---|---|---|---|
| Database Keys | Good | 2-3x less | Primary key storage | ✅ Use Judy |
| Log Data | Excellent | 3-4x less | Timestamp-based data | ✅ Use Judy |
| Analytics | Good | 2-3x less | Time-clustered data | ✅ Use Judy |
| Session Data | Good | 2-3x less | User-clustered data | ✅ Use Judy |
Key Insight: Judy performs well with real-world data patterns that have locality.
| Method | INT_TO_INT | STRING_TO_INT | Notes |
|---|---|---|---|
| PHP array (foreach assign) | 2.2 ms | 2.5 ms | Baseline |
Judy individual $j[$k] = $v |
4.7 ms | 19.3 ms | 2.1x / 7.7x slower than PHP |
| Judy putAll() | 5.5 ms | 18.6 ms | ~1.0x vs individual |
| Judy::fromArray() | 3.5 ms | 18.0 ms | 1.3x faster than individual (INT) |
Key Insight: fromArray() provides a meaningful speedup for integer-keyed types by avoiding per-element PHP method dispatch. For string-keyed types, the Judy tree traversal cost dominates.
| Method | INT_TO_INT | STRING_TO_INT | Notes |
|---|---|---|---|
PHP array ($a[$k] ?? null) |
0.23 ms | 0.26 ms | Baseline |
Judy individual $j[$k] |
0.30 ms | 0.89 ms | 1.3x / 3.4x slower than PHP |
| Judy getAll() | 0.16 ms | 0.80 ms | 1.9x / 1.1x faster than individual |
Key Insight: getAll() is significantly faster than individual lookups for integer keys (1.9x speedup) because it avoids per-element ArrayAccess overhead. For string keys the benefit is smaller but still measurable.
| Method | INT_TO_INT | STRING_TO_INT | Notes |
|---|---|---|---|
| Judy toArray() | 4.2 ms | 13.2 ms | Native C iteration |
| Judy manual foreach | 11.8 ms | 41.0 ms | PHP Iterator overhead |
| Speedup | 2.8x | 3.1x | toArray() avoids Iterator dispatch |
Key Insight: toArray() is 2-3x faster than building an array via foreach because it uses native C iteration internally, bypassing the PHP Iterator interface overhead.
| Method | INT_TO_INT | STRING_TO_INT | Notes |
|---|---|---|---|
PHP array $a[$k]++ |
1.7 ms | 2.8 ms | Baseline |
Judy manual $j[$k] = $j[$k] + 1 |
4.7 ms | 12.8 ms | Two traversals (read + write) |
| Judy increment() | 3.0 ms | 10.1 ms | Single traversal for INT (JLI); two for STRING (JSLG+JSLI) |
| increment() vs manual | 1.6x faster | 1.3x faster | Eliminates redundant lookup |
Key Insight: For INT_TO_INT, increment() achieves a true single-traversal update via JLI's insert-or-get semantics (1.6x speedup). For STRING_TO_INT, two traversals are needed (JSLG to check existence for counter tracking + JSLI to insert/update), still providing a 1.3x speedup by keeping all logic in C rather than PHP.
- Judy arrays provide 2-4x memory savings compared to PHP arrays
- Memory efficiency is consistent across all dataset sizes
- String-based Judy arrays show moderate memory savings with performance trade-offs
- Access Pattern Sensitivity: Judy's performance heavily depends on access patterns (see "The Key Difference: O(log n) vs O(1)" section above)
- Random Access: 2-9x slower than PHP arrays (Judy's weakness - O(logn) vs O(1))
- Sequential Access: 2-4x slower than PHP arrays (acceptable trade-off - leverages cache locality)
- Range Queries: Competitive with PHP arrays (Judy's strength - native sorted traversal)
- Iterator Performance: Has overhead but becomes more efficient at larger scales
- Key Type Impact: Integer keys consistently outperform string keys (radix tree optimization)
- Scale Impact: Performance gap increases with dataset size (memory efficiency becomes more valuable)
- Database Patterns: Judy performs well with sequential primary keys
- Log Data: Excellent for timestamp-based sequential data
- Analytics: Good for time-clustered data patterns
- Session Data: Effective for user-clustered data
1. Memory-Constrained Environments
- Shared hosting with limited memory
- Docker containers with memory limits
- Applications where memory usage is critical
- Benefit: 3.5x less memory usage than PHP arrays
2. Sequential Access Patterns
- Iterating through all keys/values
- Range queries and ordered operations
- Processing data in order
- Benefit: Acceptable performance with significant memory savings
3. Large Sparse Integer Datasets
- Datasets > 1M elements with sparse integer keys
- When memory efficiency outweighs random access performance
- Benefit: Excellent memory efficiency and sequential performance
1. Random Access Patterns
- Frequent lookups of specific keys
- Accessing keys in unpredictable order
- When random access performance is critical
- Reason: 2-9x slower than PHP arrays for random access
2. Small Datasets
- Datasets with < 100k elements
- When memory is not a constraint
- Reason: Overhead doesn't justify benefits
3. Performance-Critical String Operations
- High-frequency string key operations
- When performance is more important than memory
- Reason: Slower than PHP arrays for string keys
$judy = new Judy(Judy::INT_TO_INT);
// ... populate data ...
// Optimal: Use Judy's iterator
foreach ($judy as $key => $value) {
// Process each key-value pair
echo "$key => $value\n";
}$judy = new Judy(Judy::INT_TO_INT);
// ... populate data ...
// Good: Sort keys first, then access sequentially
$keys = array_keys($judy->toArray());
sort($keys);
foreach ($keys as $key) {
$value = $judy[$key];
// Process value
}// Use Judy for storage and sequential access
$judy = new Judy(Judy::INT_TO_INT);
// ... populate data ...
// Convert to PHP array for random access when needed
$php_array = $judy->toArray();
// Now you can do fast random access
$value = $php_array[50000]; // Fast random access$judy = new Judy(Judy::INT_TO_INT);
// ... populate data ...
// Avoid: Random access is very slow
$random_keys = [1000, 50000, 2000, 75000, 3000];
foreach ($random_keys as $key) {
$value = $judy[$key]; // Very slow!
}The memoryUsage() method reports internal Judy C library memory consumption for the array. Its behavior varies by Judy type because the underlying C library provides different memory-accounting macros for each array family.
| Judy Type | Underlying C Type | C Macro | memoryUsage() Return |
|---|---|---|---|
BITSET |
Judy1 | Judy1MemUsed (J1MU) |
int — bytes used |
INT_TO_INT |
JudyL | JudyLMemUsed (JLMU) |
int — bytes used |
INT_TO_MIXED |
JudyL | JudyLMemUsed (JLMU) |
int — bytes used (Judy storage only, excludes PHP zvals) |
INT_TO_PACKED |
JudyL | JudyLMemUsed (JLMU) |
int — bytes used (Judy storage only, excludes packed buffers) |
STRING_TO_INT |
JudySL | (no macro) | NULL |
STRING_TO_MIXED |
JudySL | (no macro) | NULL |
STRING_TO_INT_HASH |
JudyHS + JudySL | (no macro) | NULL |
STRING_TO_MIXED_HASH |
JudyHS + JudySL | (no macro) | NULL |
STRING_TO_INT_ADAPTIVE |
JudyL + JudyHS + JudySL | (no macro) | NULL |
STRING_TO_MIXED_ADAPTIVE |
JudyL + JudyHS + JudySL | (no macro) | NULL |
Why STRING types return NULL: The Judy C library does not provide a JudySLMemUsed macro. JudySL is internally a chain of JudyL arrays (one per byte of the key), making a single memory total impractical at the C level. Use memory_get_usage() deltas as an alternative for string-keyed types.
INT_TO_MIXED note: The value returned is only the JudyL trie memory. The PHP zval pointers stored in each slot consume additional PHP heap memory not reflected in memoryUsage().
For BITSET and INT_TO_* types, the Judy library maintains a TotalMemWords counter in the root JPM (Judy Population/Memory) structure. memoryUsage() reads this counter and multiplies by sizeof(Word_t) — an O(1) operation with no traversal.
$j = new Judy(Judy::INT_TO_INT);
$j->memoryUsage(); // 0 — empty array, no JPM allocated yet
for ($i = 0; $i < 10000; $i++) {
$j[$i] = $i;
}
$j->memoryUsage(); // e.g. 65536 — bytes used by Judy trie nodes
$j->free();
$j->memoryUsage(); // 0 — freedLarge sparse integer sets: Judy's compressed trie uses memory proportional to population, not key range. For sparse integer keys (e.g., [0, 1000000, 2000000, ...]), Judy is often more memory-efficient than a PHP array because its compressed structure only requires nodes for the populated keys.
10K elements with key step = 1000:
PHP array: ~530 KB
Judy INT_TO_INT: ~ 90 KB (5-6x less)
Judy BITSET: ~ 50 KB (10x less)
Dense integer counters: INT_TO_INT with sequential keys uses 2-4x less memory than a PHP array at large scale (100K+ elements), because Judy compresses dense key ranges into compact leaf nodes.
Bitset/presence tracking: BITSET stores only the bit index with no value storage. At 1M elements, a Judy BITSET uses ~10x less memory than $php_array[$i] = true.
Small datasets (< 1K elements): PHP's hash table has lower fixed overhead. The crossover point depends on key density, but Judy's memory advantage typically appears above ~1K elements.
Mixed-type values with frequent access: INT_TO_MIXED stores zval * pointers in JudyL slots, so the zval heap cost is identical to PHP arrays. The memoryUsage() value only reflects Judy trie overhead, not the stored values. For small-to-medium datasets, PHP arrays will use less total memory.
String-keyed lookups: JudySL's byte-by-byte trie traversal uses more memory per key than PHP's hash table for short, common-prefix-free keys.
Run the memory patterns benchmark to see results on your hardware:
php examples/judy-bench-memory-patterns.phpThis script compares Judy vs PHP arrays at 1K, 10K, 100K, and 1M elements across INT_TO_INT, BITSET, and STRING_TO_INT types, including sparse key scenarios.
# Run comprehensive benchmark suite (recommended)
php examples/run_comprehensive_benchmarks.php
# Run individual benchmark phases
php examples/benchmark_ordered_data.php
php examples/benchmark_range_queries.php
php examples/benchmark_real_world_patterns.php
# Run batch operations and increment benchmarks
php examples/judy-bench-batch-operations.php
# Run memory usage pattern comparison
php examples/judy-bench-memory-patterns.phpNote: All benchmarks use proper Iterator interface methods and run without deprecated warnings.
# Run original benchmarks (single iteration)
php examples/run-benchmarks.php
# Run robust benchmarks (multiple iterations)
php examples/run-benchmarks-robust.phpOur methodology and insights are informed by the Rusty Russell benchmark comparison between hashtables and Judy arrays, which demonstrates Judy's strengths in ordered access patterns and memory efficiency.
Judy Extension Version: 2.4.0 Last Updated: March 2026