-
Notifications
You must be signed in to change notification settings - Fork 346
RandomX v2 #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
RandomX v2 #317
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
1115ec9
RandomX v2 virtual machine changes
tevador fce33ee
Interpreter support for v2, tests
SChernykh da7af41
Enabled CI
SChernykh ebe57ef
RISC-V: added CFROUND v2
SChernykh 3f46cd7
RISC-V: added v2 FE mix code (hardware AES)
SChernykh 8c80f3a
RISC-V: added v2 FE mix code (software AES)
SChernykh 19cd192
API to switch between v1 and v2 on the fly
SChernykh 02f6c4d
RISC-V: added v2 FE mix code (scalar software AES)
SChernykh 4821dbf
Set v2 program size to 384
SChernykh 5891f5d
Improved RISC-V code
SChernykh 67df5bd
Updated documentation for v2
SChernykh 6d915f6
Improved RISC-V code
SChernykh 1cefbd7
Added v2 design doc
SChernykh 6346aa4
Added more CPU benchmarks
SChernykh 4d3c4f2
Added prefetch tweak (x64 only for now)
SChernykh 52406c6
Fixed v2 prefetch code
SChernykh d2a2e93
Prefetch first 2 iterations of the loop
SChernykh d82f065
Prefetch tweak (RISC-V)
SChernykh ddfd0c4
Prefetch tweak (aarch64)
SChernykh 0fcd8f6
Updated CPU tests
SChernykh b4a374a
Cleanup
SChernykh 4bca419
Added more CPU tests
SChernykh 1cd1e8f
Update configuration.md
SChernykh 6032448
Added 9950X tests
SChernykh 1a9f14b
ARM64: removed duplicate AES tables
SChernykh 6dca84f
Fixed typo
SChernykh 3842469
ARM64: init AES pointers in a safe way
SChernykh 3fd1d7c
Clarified temporary register use
SChernykh 5c00375
Implemented `rx_aligned_alloc` for portable fallback
SChernykh 2b9ab3e
Fixed misaligned access
SChernykh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # RandomX v2 changes and their rationale | ||
|
|
||
| ## 1. CFROUND tweak | ||
|
|
||
| In RandomX v1, CFROUND instruction changes the rounding mode on each main loop iteration. Unfortunately, x86 CPUs were not designed for the rounding mode changing that often. As a result, this single instruction costs up to 10% of hashrate on Ryzen CPUs. This is where an ASIC or a specially designed CPU can get an easy advantage. | ||
|
|
||
| RandomX v2 reduces the impact massively: CFROUND will now change the rounding mode only every 16th time it executes (on average). | ||
|
|
||
| ## 2. AES tweak | ||
|
|
||
| F and E registers are now mixed together with AES instead of XOR (step 10 in chapter 4.6.2 of the spec). | ||
|
|
||
| - AES tweak doubles the amount of AES computations per hash without hurting the hashrate (it uses the gap in RandomX main loop where the CPU was sitting idle, waiting for scratchpad data). | ||
| - AES tweak also introduces AES in the main RandomX loop which makes it harder for specialized hardware to get away with just a dedicated circuit for scratchpad intialization - AES must be implemented as a part of RandomX VM and work with RandomX VM's registers. | ||
| - AES tweak also improves data entropy (makes it more random) before it's written to the scratchpad. | ||
|
|
||
| ## 3. Program size increase from 256 to 384 | ||
|
|
||
| CPUs got much faster since the original RandomX was released. Back in 2019, Ryzen 9 3950X was the fastest desktop CPU for RandomX, and at the time of writing (January 2026) it's Ryzen 9 9950X. In most CPU benchmarks, 9950X is more than 1.5x faster on average - thanks to clock speed increase from < 4 to > 5 GHz, and to IPC improvements. | ||
|
|
||
| But in RandomX it's only 20-25% faster, because it's bottlenecked by the RAM latency. While CPU cores got faster over the years, RAM latency stayed basically the same - a tuned DDR4 memory from 2019, and a tuned DDR5 memory from 2026 will both have the same access latency of around 50-55 ns. | ||
|
|
||
| This imbalance is the main reason of the program size increase - Zen5 and newer CPUs need more work to keep themselves busy while they're waiting for data from memory. | ||
|
|
||
| ## 4. Prefetch two main loop iterations ahead instead of just one | ||
|
|
||
| RandomX v1 prefetches data from the dataset one iteration ahead. RandomX v2 increases it to two iterations by redefining the prefetch logic (see the `mp` register in specs.md). | ||
|
|
||
| This change complements the program size increase tweak and has the same purpose. | ||
|
|
||
| ## 4. Performance impact | ||
|
|
||
| Tests show that RandomX v2, while being more than 1.5 times "heavier" than RandomX v1, results in only a slight hashrate reduction but massive efficiency improvements (in terms of VM+AES instructions per Joule): | ||
|
|
||
| ### AMD Ryzen 9 9950X (Zen 5) @ 285W (PBO max) | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|27186.1|100.0%|95.38|121.15e9|425.1e6|100.0%| | ||
| RandomX v2|26791.7|98.55%|94.01|182.61e9|640.72e6|**150.72%**| | ||
|
|
||
| ### AMD Ryzen 9 9950X (Zen 5) @ 100W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|19912.2|100.0%|199.122|88.74e9|887.38e6|100.0%| | ||
| RandomX v2|17346.2|87.11%|173.462|118.23e9|1182.27e6|**133.23%**| | ||
|
|
||
| ### AMD Ryzen AI 9 HX 370 (Zen 5), DDR5-5600 @ 28W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|6597.15|100.0%|235.61|29.4e9|1050e6|100.0%| | ||
| RandomX v2|7121.69|107.95%|254.35|48.54e9|1733.56e6|**165.1%**| | ||
|
|
||
| ### Ryzen AI 9 365 (Zen 5) @ 28W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|6091|100.0%|217.5|27.14e9|969.44e6|100.0%| | ||
| RandomX v2|6649|109.2%|237.5|45.32e9|1618.5e6|**166.95%**| | ||
|
|
||
| ### Ryzen 9 7945HX (Zen 4) @ 62W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|16126|100.0%|260.1|71.86e9|1159.11e6|100.0%| | ||
| RandomX v2|15308|94.9%|246.9|104.33e9|1682.83e6|**145.18%**| | ||
|
|
||
| ### Ryzen 5 8600G (Zen 4) @ 45W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|5876.47|100.0%|130.59|26.19e9|581.96e6|100.0%| | ||
| RandomX v2|5375.29|91.5%|119.45|36.64e9|814.15e6|**139.9%**| | ||
|
|
||
| ### Ryzen 9 5950X (Zen 3) @ 122-126W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|14745.9 @ 126W|100.0%|117.03|65.71e9|521.54e6|100.0%| | ||
| RandomX v2|12905.3 @ 122W|87.5%|105.78|87.96e9|720.98e6|**138.2%**| | ||
|
|
||
| ### Ryzen 9 3950X (Zen 2) @ 131W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|15049.34|100.0%|114.88|67.07e9|511.96e6|100.0%| | ||
| RandomX v2|13868.64|92.15%|105.87|94.53e9|721.57e6|**140.94%**| | ||
|
|
||
| ### Ryzen 7 3700X (Zen 2) @ 88W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|8624|100.0%|98|38.43e9|436.73e6|100.0%| | ||
| RandomX v2|7361|85.35%|83.65|50.17e9|570.12e6|**130.54%**| | ||
|
|
||
| ### Ryzen 7 1700X (Zen 1) @ 95W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|4832.73|100.0%|50.87|21.54e9|226.7e6|100.0%| | ||
| RandomX v2|4870.41|100.78%|51.27|33.2e9|349.43e6|**154.13%**| | ||
|
|
||
| ### Intel Core i9-12900K @ 125W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|8644.47|100.0%|69.16|38.52e9|308.19e6|100.0%| | ||
| RandomX v2|8310.78|96.14%|66.49|56.64e9|453.15e6|**147.04%**| | ||
|
|
||
| ### Intel Core i7-8650U @ 15W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|1831.15|100.0%|122.08|8.16e9|544.03e6|100.0%| | ||
| RandomX v2|1415|77.27%|94.33|9.64e9|642.95e6|**118.18%**| | ||
|
|
||
| ### Intel Core i7-6820HQ @ 45W | ||
| |Algorithm|Hashrate|Relative speed|Hash/Joule|VM+AES/s|VM+AES/Joule|Relative work/Joule| | ||
| |-|-|-|-|-|-|-| | ||
| RandomX v1|1968.56|100.0%|43.75|8.77e9|194.95e6|100.0%| | ||
| RandomX v2|1488.25|75.6%|33.07|10.14e9|225.41e6|**115.62%**| |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,12 @@ | ||
| ;# restore callee-saved registers - System V AMD64 ABI | ||
| pop r15 | ||
hyc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| pop r14 | ||
| pop r13 | ||
| pop r12 | ||
| pop rbp | ||
| pop rbx | ||
| mov r15, qword ptr [rsp+280] | ||
| mov r14, qword ptr [rsp+272] | ||
| mov r13, qword ptr [rsp+264] | ||
| mov r12, qword ptr [rsp+256] | ||
| mov rbp, qword ptr [rsp+232] | ||
| mov rbx, qword ptr [rsp+224] | ||
|
|
||
| add rsp, 456 | ||
|
|
||
| ;# program finished | ||
| ret 0 | ||
| ret | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,24 +1,24 @@ | ||
| ;# restore callee-saved registers - Microsoft x64 calling convention | ||
| movdqu xmm15, xmmword ptr [rsp] | ||
| movdqu xmm14, xmmword ptr [rsp+16] | ||
| movdqu xmm13, xmmword ptr [rsp+32] | ||
| movdqu xmm12, xmmword ptr [rsp+48] | ||
| movdqu xmm11, xmmword ptr [rsp+64] | ||
| add rsp, 80 | ||
| movdqu xmm10, xmmword ptr [rsp] | ||
| movdqu xmm9, xmmword ptr [rsp+16] | ||
| movdqu xmm8, xmmword ptr [rsp+32] | ||
| movdqu xmm7, xmmword ptr [rsp+48] | ||
| movdqu xmm6, xmmword ptr [rsp+64] | ||
| add rsp, 80 | ||
| pop r15 | ||
| pop r14 | ||
| pop r13 | ||
| pop r12 | ||
| pop rsi | ||
| pop rdi | ||
| pop rbp | ||
| pop rbx | ||
| movdqa xmm15, xmmword ptr [rsp+432] | ||
| movdqa xmm14, xmmword ptr [rsp+416] | ||
| movdqa xmm13, xmmword ptr [rsp+400] | ||
| movdqa xmm12, xmmword ptr [rsp+384] | ||
| movdqa xmm11, xmmword ptr [rsp+368] | ||
| movdqa xmm10, xmmword ptr [rsp+352] | ||
| movdqa xmm9, xmmword ptr [rsp+336] | ||
| movdqa xmm8, xmmword ptr [rsp+320] | ||
| movdqa xmm7, xmmword ptr [rsp+304] | ||
| movdqa xmm6, xmmword ptr [rsp+288] | ||
| mov r15, qword ptr [rsp+280] | ||
| mov r14, qword ptr [rsp+272] | ||
| mov r13, qword ptr [rsp+264] | ||
| mov r12, qword ptr [rsp+256] | ||
| mov rdi, qword ptr [rsp+248] | ||
| mov rsi, qword ptr [rsp+240] | ||
| mov rbp, qword ptr [rsp+232] | ||
| mov rbx, qword ptr [rsp+224] | ||
|
|
||
| add rsp, 456 | ||
|
|
||
| ;# program finished | ||
| ret |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.