DuckDB CUDA Silent Payments Extension

A high-performance DuckDB extension that provides GPU-accelerated Bitcoin Silent Payments (BIP-352) scanning using NVIDIA CUDA. This extension enables efficient scanning of large transaction datasets by leveraging GPU parallel processing for elliptic curve cryptography operations.

Features

GPU Acceleration: Utilizes NVIDIA CUDA for parallel elliptic curve multiplication
Multi-GPU Support: Automatically distributes workload across multiple GPUs
High Throughput: Processes millions of transactions per second
Optimized Batching: Configurable batch sizes for optimal GPU utilization
Thread-Safe: Concurrent multi-user access supported
Memory Efficient: Handles databases with 100M+ rows

Building the Extension

Prerequisites

CMake 3.18 or higher
C++ compiler with C++17 support
NVIDIA GPU with compute capability 8.0+ (Ampere, Ada Lovelace, or Hopper)
CUDA Toolkit 12.8 or 13.0
Python 3 (for gECC constant generation)
Git

Supported GPUs:

NVIDIA A100 (compute capability 80)
NVIDIA RTX 30xx series (compute capability 86)
NVIDIA RTX 40xx/50xx series (compute capability 89)
NVIDIA H100/H200 (compute capability 90)

Build Steps

Clone the repository:

git clone --recursive https://github.com/sparrowwallet/duckdb-cudasp-extension.git
cd duckdb-cudasp-extension

Set CUDA environment variables (if necessary):

export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Build the extension:

make clean
make

Run tests:

make test

The compiled extension will be available at build/release/extension/cudasp/cudasp.duckdb_extension. Running the compiled DuckDB binary at build/release/duckdb will run DuckDB with the extension already loaded.

Loading the Extension

LOAD 'path/to/cudasp.duckdb_extension';

Functions

Primary Function

`cudasp_scan(input_table, scan_private_key, spend_public_key, label_keys, batch_size := 300000)`

Scans a table of Bitcoin transactions for Silent Payments (BIP-352) matches using GPU acceleration. This function implements the complete Silent Payments scanning algorithm with optimized elliptic curve operations.

Parameters:

input_table (TABLE): Input table with columns:
- txid (BLOB): 32-byte transaction ID
- height (INTEGER): Block height
- tweak_key (BLOB): 64-byte uncompressed EC point (32-byte x || 32-byte y, little-endian)
- outputs (BIGINT[]): Array of output values (first 8 bytes of x-coordinates as big-endian integers)
scan_private_key (BLOB): 32-byte scan private key (little-endian)
spend_public_key (BLOB): 64-byte uncompressed spend public key (32-byte x || 32-byte y, little-endian)
label_keys (LIST[BLOB]): Array of 64-byte uncompressed label public keys (can be empty)
batch_size (INTEGER, optional): Number of rows to process per GPU batch (default: 300000)

Returns: TABLE with columns:

txid (BLOB): Transaction ID of matching transaction
height (INTEGER): Block height of matching transaction
tweak_key (BLOB): Tweak key that produced the match

Algorithm:

Batch Processing: Groups input rows into batches for efficient GPU processing
EC Multiplication: Computes tweak_key × scan_private_key for each row
Shared Secret: Hashes the result using BIP-352 tagged hash (SHA256)
Fixed-Point Multiplication: Computes shared_secret × G using GPU-optimized fixed-point multiplication
Point Addition: Adds spend public key to create candidate output keys
Label Checking: Tests both base output and label-tweaked variants
Match Detection: Compares x-coordinates against output list
Result Aggregation: Returns all matching transactions

Example:

-- Create a table of transactions to scan
CREATE TABLE tweak AS
SELECT
    txid,
    height,
    tweak_key,
    outputs
FROM read_parquet('bitcoin_transactions.parquet');

-- Scan for silent payments
SELECT hex(txid), height
FROM cudasp_scan(
    (SELECT txid, height, tweak_key, outputs FROM tweak),
    from_hex('0f694e068028a717f8af6b9411f9a133dd3565258714cc226594b34db90c1f2c'),  -- scan_private_key
    from_hex('36cf8fcd4d4890ab6c1083aeb5b50c260c20acda7839120e3575836f6d85c95ce0d705e31ff9fdcce67a8f3598871c6dfbe6bcde8a51cb7b48b0f95be0ea94de'),  -- spend_public_key
    [from_hex('cd63f9212a2deebde8a71e9ea23f6f958c47c41d2ed74b9617fe6fb554d1524e292fabddbdcbb643eafc328875c46d75a1d697b2b31c42d38aa93f85eab34bc1')],  -- label_keys
    batch_size := 300000
);

Performance Characteristics

Throughput Benchmarks

Measured on dual RTX 5090 GPUs with batch_size = 300000:

Dataset Size	Processing Time	Throughput (tx/sec)
1 week (1M rows)	575ms	1,989,401
2 weeks (2.3M rows)	1.04s	2,265,266
4 weeks (5M rows)	2.28s	2,198,706
8 weeks (9.4M rows)	3.64s	2,596,475
32 weeks (32.7M rows)	12.5s	2,622,216

Multi-GPU Scaling

Single GPU: ~7.2 seconds for 1M rows
Dual GPU: ~6.1 seconds for 1M rows (~1.17× speedup)
Speedup limited by serial table scan overhead

Multi-GPU Support

The extension automatically detects and utilizes multiple GPUs:

SELECT * FROM cudasp_scan(...);
-- Both GPUs will process batches concurrently

GPU Assignment:

Round-robin thread assignment to GPUs
Independent CUDA streams per thread
Thread-safe per-device initialization

Monitoring GPU Usage

# Real-time GPU monitoring (recommended)
nvtop

# Or use nvidia-smi
nvidia-smi -l 0.5

Technical Details

Dependencies

gECC: Fork of GPU elliptic curve cryptography library
NVIDIA CUDA Runtime (statically linked)
DuckDB 1.4.1

CUDA Optimizations

Column-major memory layout: Optimized for coalesced GPU memory access
Fixed-point multiplication: Precomputed base point multiples
Batch inversion: Efficient modular inverse using Montgomery's trick
Persistent L2 cache: Pinned frequently accessed data
Concurrent kernel execution: Multiple batches processed simultaneously on multi-GPU

Error Handling

The function handles errors gracefully:

Returns empty result set if no matches found
Throws exception for invalid input formats
Validates BLOB sizes (32 bytes for scalars, 64 bytes for points)
Reports CUDA errors with detailed messages

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

gECC for GPU elliptic curve operations
BIP-352 Silent Payments specification

Name		Name	Last commit message	Last commit date
Latest commit History 368 Commits
.github/workflows		.github/workflows
docs		docs
duckdb @ b390a7c		duckdb @ b390a7c
extension-ci-tools @ c098325		extension-ci-tools @ c098325
gECC @ 5622411		gECC @ 5622411
scripts		scripts
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
extension_config.cmake		extension_config.cmake
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DuckDB CUDA Silent Payments Extension

Features

Building the Extension

Prerequisites

Build Steps

Loading the Extension

Functions

Primary Function

`cudasp_scan(input_table, scan_private_key, spend_public_key, label_keys, batch_size := 300000)`

Performance Characteristics

Throughput Benchmarks

Multi-GPU Scaling

Multi-GPU Support

Monitoring GPU Usage

Technical Details

Dependencies

CUDA Optimizations

Error Handling

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

sparrowwallet/duckdb-cudasp-extension

Folders and files

Latest commit

History

Repository files navigation

DuckDB CUDA Silent Payments Extension

Features

Building the Extension

Prerequisites

Build Steps

Loading the Extension

Functions

Primary Function

cudasp_scan(input_table, scan_private_key, spend_public_key, label_keys, batch_size := 300000)

Performance Characteristics

Throughput Benchmarks

Multi-GPU Scaling

Multi-GPU Support

Monitoring GPU Usage

Technical Details

Dependencies

CUDA Optimizations

Error Handling

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`cudasp_scan(input_table, scan_private_key, spend_public_key, label_keys, batch_size := 300000)`

Packages