Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rlnc"
version = "0.8.6"
version = "0.8.7"
edition = "2024"
resolver = "3"
rust-version = "1.89.0"
Expand Down
16 changes: 3 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,27 +157,17 @@ For visualizing benchmark results, run following command, which will produce PNG
make bench_then_plot # Only runs with `default` features enabled
```

### On 12th Gen Intel(R) Core(TM) i7-1260P

Running benchmarks on `Linux 6.14.0-27-generic x86_64`, compiled with `rustc 1.88.0 (6b00bc388 2025-06-23)`.

Component | Peak Median Throughput (`default` feature) | Peak Median Throughput (`parallel` feature) | Impact of number of pieces on performance
--- | --- | --- | ---
Full RLNC Encoder | **30.14 GiB/s** | **23.39 GiB/s** | The number of pieces original data got split into has a **minimal** impact on the encoding speed.
Full RLNC Recoder | **27.26 GiB/s** | **12.63 GiB/s** | Similar to the encoder, the recoder's performance remains largely consistent regardless of how many pieces the original data is split into.
Full RLNC Decoder | **1.59 GiB/s** | **Doesn't yet implement a parallel decoding mode** | As the number of pieces increases, the decoding time increases substantially, leading to a considerable drop in throughput. This indicates that decoding is the most computationally intensive part of the full RLNC scheme, and its performance is inversely proportional to the number of pieces.

In summary, the full RLNC implementation demonstrates excellent encoding and recoding speeds, consistently achieving GiB/s throughputs with minimal sensitivity to the number of data pieces. The `parallel` feature, leveraging Rust `rayon` data-parallelism framework, also provides good performance for both encoding and recoding. Whether you want to use that feature, completely depends on your usecase. However, decoding remains a much slower operation, with its performance significantly diminishing as the data is split into a greater number of pieces, and currently does **not** implement a parallel decoding algorithm.
More performance benchmarking results are displayed on README inside [./plots](./plots) directory.

## Usage

To use `rlnc` library crate in your Rust project, add it as a dependency in your `Cargo.toml`:

```toml
[dependencies]
rlnc = "=0.8.6" # On x86_64 and aarch64 targets, it offers fast encoding, recoding and decoding, using SIMD intrinsics.
rlnc = "=0.8.7" # On x86_64 and aarch64 targets, it offers fast encoding, recoding and decoding, using SIMD intrinsics.
# or
rlnc = { version = "=0.8.6", features = "parallel" } # Uses `rayon`-based data-parallelism for fast encoding and recoding. Note, this feature, doesn't yet parallelize RLNC decoding.
rlnc = { version = "=0.8.7", features = "parallel" } # Uses `rayon`-based data-parallelism for fast encoding and recoding. Note, this feature, doesn't yet parallelize RLNC decoding.

rand = { version = "=0.9.2" } # Required for random number generation
```
Expand Down
98 changes: 98 additions & 0 deletions plots/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Plotted Performance Benchmark Results

Following plots are generated by running this make recipe, from root of the repository, on machines with specified configuration.

> [!NOTE]
> These benchmark results don't capture the performance of running RLNC encoder, recoder and decoder with `parallel` feature.

```bash
make bench_then_plot
```

## Performance Characteristics

Algorithm | Characteristics
--- | ---
Encoding | The number of pieces original data got split into has a **minimal** impact on the encoding speed.
Recoding | Recoding is a wrapper over encoding, with an additional matrix-vector multiplication. If number of pieces increases, the dimension of matrix-vector multiplication also increases, resulting in higher computational complexity, during Recoding.
Decoding | As the number of pieces increases, the decoding time increases substantially, leading to a considerable drop in throughput. This indicates that decoding is the most computationally intensive part of the full RLNC scheme, and its performance is inversely proportional to the number of pieces.

In summary, this RLNC implementation demonstrates excellent encoding and recoding speeds, consistently achieving GiB/s throughputs with relatively minimal sensitivity to the number of data pieces. The `parallel` feature, leveraging Rust `rayon` data-parallelism framework, also provides good performance for both encoding and recoding. Whether you want to use that feature, completely depends on your use case. However, decoding remains a much slower operation, with its performance significantly diminishing as the data is split into a greater number of pieces, and currently does **not** implement a parallel decoding algorithm.

## On 12th Gen Intel(R) Core(TM) i7-1260P

Running Linux kernel

```bash
$ uname -srm
Linux 6.17.0-5-generic x86_64
```

with Rust compiler

```bash
$ rustc --version
rustc 1.90.0 (1159e78c4 2025-09-14)
```

and following CPU feature flags

```bash
$ lscpu | awk -F': *' '/Flags/{gsub(" ", ", ", $2);print $2}'
fpu, vme, de, pse, tsc, msr, pae, mce, cx8, apic, sep, mtrr, pge, mca, cmov, pat, pse36, clflush, dts, acpi, mmx, fxsr, sse, sse2, ss, ht, tm, pbe, syscall, nx, pdpe1gb, rdtscp, lm, constant_tsc, art, arch_perfmon, pebs, bts, rep_good, nopl, xtopology, nonstop_tsc, cpuid, aperfmperf, tsc_known_freq, pni, pclmulqdq, dtes64, monitor, ds_cpl, vmx, smx, est, tm2, ssse3, sdbg, fma, cx16, xtpr, pdcm, pcid, sse4_1, sse4_2, x2apic, movbe, popcnt, tsc_deadline_timer, aes, xsave, avx, f16c, rdrand, lahf_lm, abm, 3dnowprefetch, cpuid_fault, epb, ssbd, ibrs, ibpb, stibp, ibrs_enhanced, tpr_shadow, flexpriority, ept, vpid, ept_ad, fsgsbase, tsc_adjust, bmi1, avx2, smep, bmi2, erms, invpcid, rdseed, adx, smap, clflushopt, clwb, intel_pt, sha_ni, xsaveopt, xsavec, xgetbv1, xsaves, split_lock_detect, user_shstk, avx_vnni, dtherm, ida, arat, pln, pts, hwp, hwp_notify, hwp_act_window, hwp_epp, hwp_pkg_req, hfi, vnmi, umip, pku, ospke, waitpkg, gfni, vaes, vpclmulqdq, rdpid, movdiri, movdir64b, fsrm, md_clear, serialize, arch_lbr, ibt, flush_l1d, arch_capabilities
```

### Encoder

![benchmark_encode_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_encode_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

![benchmark_encode_zero_alloc_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_encode_zero_alloc_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

### Recoder

![benchmark_recode_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_recode_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

![benchmark_recode_zero_alloc_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_recode_zero_alloc_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

### Decoder

![benchmark_decode_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_decode_on_12th_Gen_IntelR_CoreTM_i7-1260P_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

## On AMD EPYC 9R14 (AWS EC2 `m7a.large`)

Running Linux kernel

```bash
$ uname -srm
Linux 6.14.0-1011-aws x86_64
```

with Rust compiler

```bash
$ rustc --version
rustc 1.90.0 (1159e78c4 2025-09-14)
```

and following CPU feature flags

```bash
$ lscpu | awk -F': *' '/Flags/{gsub(" ", ", ", $2);print $2}'
fpu, vme, de, pse, tsc, msr, pae, mce, cx8, apic, sep, mtrr, pge, mca, cmov, pat, pse36, clflush, dts, acpi, mmx, fxsr, sse, sse2, ss, ht, tm, pbe, syscall, nx, pdpe1gb, rdtscp, lm, constant_tsc, art, arch_perfmon, pebs, bts, rep_good, nopl, xtopology, nonstop_tsc, cpuid, aperfmperf, tsc_known_freq, pni, pclmulqdq, dtes64, monitor, ds_cpl, vmx, smx, est, tm2, ssse3, sdbg, fma, cx16, xtpr, pdcm, pcid, sse4_1, sse4_2, x2apic, movbe, popcnt, tsc_deadline_timer, aes, xsave, avx, f16c, rdrand, lahf_lm, abm, 3dnowprefetch, cpuid_fault, epb, ssbd, ibrs, ibpb, stibp, ibrs_enhanced, tpr_shadow, flexpriority, ept, vpid, ept_ad, fsgsbase, tsc_adjust, bmi1, avx2, smep, bmi2, erms, invpcid, rdseed, adx, smap, clflushopt, clwb, intel_pt, sha_ni, xsaveopt, xsavec, xgetbv1, xsaves, split_lock_detect, user_shstk, avx_vnni, dtherm, ida, arat, pln, pts, hwp, hwp_notify, hwp_act_window, hwp_epp, hwp_pkg_req, hfi, vnmi, umip, pku, ospke, waitpkg, gfni, vaes, vpclmulqdq, rdpid, movdiri, movdir64b, fsrm, md_clear, serialize, arch_lbr, ibt, flush_l1d, arch_capabilities
```

### Encoder

![benchmark_encode_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_encode_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

![benchmark_encode_zero_alloc_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_encode_zero_alloc_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

### Recoder

![benchmark_recode_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_recode_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

![benchmark_recode_zero_alloc_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_recode_zero_alloc_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14.png)

### Decoder

![benchmark_decode_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14](./benchmark_decode_on_AMD_EPYC_9R14_with_rustc_1.90.0_1159e78c4_2025-09-14.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading